DESeq2 paired multifactor test
4
2
Entering edit mode
hwillenbrock ▴ 20
@hwillenbrock-6937
Last seen 10.0 years ago
Denmark

Hi,

I am wondering what the correct model formula and analysis approach is with DESeq2 for my multifactor paired design. I have tried to look at previous answers, e.g. DESEq2 comparison with two conditions and paired experiments and DESeq2 multiple factors nested design but I keep getting warnings from DESeq2, so I must be interpreting them wrong, or my design is slightly different.

What I have is 2 different tissue cell cultures, 5 replicates, that have been treated with a chemical. I have RNAseq count data for both the untreated and treated samples, so each treated vs. untreated is a pair:

 

sample   tissue culture    treatment
1   tissueX 1 unstimulated
2   tissueX 1 stimulated
3   tissueX 2 unstimulated
4   tissueX 2 stimulated
5   tissueX 3 unstimulated
6   tissueX 3 stimulated
7   tissueX 4 unstimulated
8   tissueX 4 stimulated
9   tissueX 5 unstimulated
10  tissueX 5 stimulated
11  tissueY 6 unstimulated
12  tissueY 6 stimulated
13  tissueY 7 unstimulated
14  tissueY 7 stimulated
15  tissueY 8 unstimulated
16  tissueY 8 stimulated
17  tissueY 9 unstimulated
18  tissueY 9 stimulated
19  tissueY 10 unstimulated
20  tissueY 10 stimulated

 

 

 

The genes that I'm interested in identifying, are those where the stimulation effect differs most between the two tissues. If not taking the paired design into account, I can use the model: ~tissue+treatment+tissue:treatment that is accepted without errors by DESeq2. However, I don't get very significant interactions and I'd like to take advantage of the paired design.

If I just analyze tissueX unstimulated versus stimulated, I can add the "culture" factor to the model and it improves the pvalues considerably: ~subject+treatment

Simply adding the subject term to the model formula as for the simple one tissue paired case, like this: ~subject + tissue+treatment+tissue:treatment gives me an error.

Trying to interpolate a previous answer to my case: ~ subject + treatment + tissue:treatment gives me the same error: "the model matrix is not full rank, so the model cannot be fit as specified."

How do I incorporate the paired design into the model formula and extract the correct significance values?

 

Thanks in advance,

Hanni

 

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] ggplot2_1.0.0             DESeq2_1.6.1
 [3] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3
 [5] GenomicRanges_1.18.1      GenomeInfoDb_1.2.2
 [7] IRanges_2.0.0             S4Vectors_0.4.0
 [9] Biobase_2.26.0            BiocGenerics_0.12.0

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.0
 [4] base64enc_0.1-2      BatchJobs_1.4        BBmisc_1.7
 [7] BiocParallel_1.0.0   brew_1.0-6           checkmate_1.5.0
[10] cluster_1.15.3       codetools_0.2-9      colorspace_1.2-4
[13] DBI_0.3.1            digest_0.6.4         fail_1.2
[16] foreach_1.4.2        foreign_0.8-61       Formula_1.1-2
[19] genefilter_1.48.1    geneplotter_1.44.0   grid_3.1.0
[22] gtable_0.1.2         Hmisc_3.14-5         iterators_1.0.7
[25] labeling_0.3         lattice_0.20-29      latticeExtra_0.6-26
[28] locfit_1.5-9.1       MASS_7.3-31          munsell_0.4.2
[31] nnet_7.3-8           plyr_1.8.1           proto_0.3-10
[34] RColorBrewer_1.0-5   reshape2_1.4         rpart_4.1-8
[37] RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1
[40] splines_3.1.0        stringr_0.6.2        survival_2.37-7
[43] tools_3.1.0          XML_3.98-1.1         xtable_1.7-4
[46] XVector_0.6.0

deseq2 paired • 16k views
ADD COMMENT
5
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Hanni,

There is a trick from the edgeR user guide (under the section "Comparisons Both Between and Within Subjects") for building a R formula for such an experiment. For the above column data, add a new column for culture:

dds$culture.nested = factor(rep(rep(1:5,each=2),2))

Make sure that unstimulated is the base level of treatment:

dds$treatment <- relevel(dds$treatment, "unstimulated")

so now the column data looks like:

     tissue culture    treatment culture.nested
 1  tissueX       1 unstimulated              1
 2  tissueX       1   stimulated              1
 3  tissueX       2 unstimulated              2
 4  tissueX       2   stimulated              2
 5  tissueX       3 unstimulated              3
 6  tissueX       3   stimulated              3
 7  tissueX       4 unstimulated              4
 8  tissueX       4   stimulated              4
 9  tissueX       5 unstimulated              5
 10 tissueX       5   stimulated              5
 11 tissueY       6 unstimulated              1
 12 tissueY       6   stimulated              1
 13 tissueY       7 unstimulated              2
 14 tissueY       7   stimulated              2
 15 tissueY       8 unstimulated              3
 16 tissueY       8   stimulated              3
 17 tissueY       9 unstimulated              4
 18 tissueY       9   stimulated              4
 19 tissueY      10 unstimulated              5
 20 tissueY      10   stimulated              5

Then use the design:

design(dds) = ~ tissue + tissue:culture.nested + tissue:treatment

To make the extraction of results simpler, specify a standard model matrix:

dds = DESeq(dds, modelMatrixType="standard")

To test for genes which respond to treatment effect in tissue X:

results(dds, name="tissuetissueX.treatmentstimulated")

To test for genes which respond to treatment effect in tissue Y:

results(dds, name="tissuetissueY.treatmentstimulated")

To test for genes where the treatment effect differs between the two tissues:

results(dds, contrast=list("tissuetissueY.treatmentstimulated", "tissuetissueX.treatmentstimulated"))
ADD COMMENT
0
Entering edit mode

Hi Michael,

Although this is a 5 years old post, it is still a relevant one. I have been reading all your answers regarding paired multifactor design. I hope, I understood most of it. What I am struggling to understand is if our goal is to test for genes which respond to treatment effect in different tissues then (tissue:treatment) interaction term is the one to go for, why will we need (tissue:culture.nested) interaction term?

Thanks in advance for all the help you are providing.

Nurun

ADD REPLY
0
Entering edit mode

The difference between those terms is subtle and I’d recommend working with a statistician that can walk you through it.

ADD REPLY
0
Entering edit mode

Hi Michael,

I have a very similar dataset as described here, but with an additional time point involved. So basically there is another column with timepoint at 4h and 8h and the dataset is twice as big. How would it change the design? Is it still valid if I perform the same procedure as you described here, but separately for the two time points?

Thank you very much for your help!

Best regards drgb

ADD REPLY
0
Entering edit mode
hwillenbrock ▴ 20
@hwillenbrock-6937
Last seen 10.0 years ago
Denmark

Hi Michael,

Thanks!

That works, but won't this make the model assume that culture 1 and 6 are the same, as if the tissue samples where taken from the same person (which they are not)?

ADD COMMENT
0
Entering edit mode

It's fine because this difference is absorbed in the main tissue effect. For the comparisons you want to make, the interactions are all you need.

ADD REPLY
0
Entering edit mode

Hi,

I believe I have an analogous issue, however I am still not sure what the best way to proceed is and would be grateful for your help.

In my multi-factor design  variables are:
treatment (before/after) / patient / disease (A/B) / response (Y/N)

For each patient I have one sample taken before treatment and one after (therefore with the same value for response and for disease).

1) The first question is to compare expression after/before treatment
I have been told by collaborators that the two diseases are expected to behave very differently and it does not make much sense to look at effects considering both together.
Thus, to keep things simple I created two different datasets and used design=~patient+ treatment in each of the them in order to extract the global effect of treatment across all patients, at the same time controlling for patient specific effects.
However, doing so I am ignoring the 'response' variable.

2) The second question is comparing responders (response=Y) vs non-responders, separately before and after treatment
One Naive solution would be to concentrate on only a subportion of the data (e.g. before treatment) and use design=~response
Instead, using all data with a design in which both patient and response variables are present gives problems with matrix rank.

Could you advise on the best design to answer both 1) and 2) ? And further clarify how the 'trick' to avoid problems with matrix rank should be implemented?

Thanks a lot!

 

ADD REPLY
0
Entering edit mode

1) If you want to compare expression before and after treatment, your current approach seems fine.

2) You can use the trick above to see if responders have a different treatment effect, controlling for patient differences. I would continue to split by disease to make things easier. Then follow the steps above, but for your analysis, "culture" is "patient" and "tissue" is "response".

Your design should look like ~ response + response:patient.nested + response:treatment. If you have different number of patients for response = Y and N then you'll have to do some extra work to create your own model matrix using model.matrix() and then to remove columns of the model matrix which have all zeros. See this other thread for how to do that: 

C: DESeq2: Extracting interaction term

Run DESeq(dds, betaPrior=FALSE) or if you needed to create your own model matrix because the numbers of responders and non responders were different, run DESeq(dds, full=mm, betaPrior=FALSE). You will have the treatment effect in nonresponders and the treatment effect responders as the two interaction terms of treatment and response. These can be extracted with 'name' like above and contrasted with contrast=list(...) like above. The contrast tests for which genes the treatment effect was different for responders vs non-responders.

ADD REPLY
0
Entering edit mode

Hi Michael,

Thanks so much for your answer!
I would be extremely grateful if you could verify whether I implemented your suggestion correctly and clarify a few doubts regarding the extraction of results

This is how my projectinfo table looks like

sample treatment patient response patient.nested
       2   control       1        Y             1
       7   treated       1        Y             1
       6   control       2        Y             2
      10   treated      2        Y             2
       5   control       3        Y             3
      11   treated      3        Y             3
       1   control       4        N             1
       8   treated       4        N             1
       4   control       5        N             2
       9   treated       5        N             2
       3   control       6        N             3
      12   treated      6        N             3

Differential expression analysis:

> data <-DESeqDataSetFromMatrix(countData = countsTable,
+                                 colData = projectinfo,
+                                 design = ~response+response:patient.nested+response:treatment)
> diffexpr<-DESeq(data, test="Wald", betaPrior=FALSE)
> resultsNames(diffexpr)
[1] "Intercept"                  "response_Y_vs_N"            "responseN.patient.nested2"
[4] "responseY.patient.nested2"  "responseN.patient.nested3"  "responseY.patient.nested3"
[7] "responseN.treatmenttreated" "responseY.treatmenttreated"

The lists I want to extract

1a) genes with differential expression before/after treatment in responders
i.e. resp=Y&treatment=treated vs resp=Y&treatment=control
Is it correct that this is given by:
results(diffexpr,  name="responseY.treatmenttreated")  ?

1b) and analogously for non-responders (resp=N&treatment=treated vs resp=N&treatment=control) would be

results(diffexpr,  name="responseN.treatmenttreated")

2) genes with differential expression responders/non-responders separately for treated and controls, i.e.
 
2a) resp=Y&treatment=treated vs resp=N&treatment=treated
and
2b) resp=Y&treatment=control vs resp=N&treatment=control

It is not clear to me whether I can get these out of

results(diffexpr, contrast=list("responseY.treatmenttreated", "responseN.treatmenttreated"))

I realize that these are questions related to linear modeling and not specific to your tool, so I need to improve my knowledge on the topic and hope to find a nearby statistician who can help.

Thanks a lot again for your help!

Fulvia

 

ADD REPLY
0
Entering edit mode

1a) and 1b) yes

2a) and 2b) you cannot compare directly across response like this with a model which has fixed effects for controlling patient differences. How can you know if what you see if a change for response=Y and not specific to patient 1,2,3 in response? This you might want to discuss with a statistician if it's hard to see why you can't make this comparison.

The comparison I was referring to, where you contrast the two interaction terms, answers the question: is the treatment effect different for response=Y vs response=N.

ADD REPLY
0
Entering edit mode

Thanks for all the feedback! I have in the meantime been thinking about this and the importance of controlling for patient-to-patient differences.

ADD REPLY
0
Entering edit mode
@alicebdennis-7066
Last seen 8.5 years ago
Switzerland

Hi all,

 

I have a similar problem/nested design, but I think my problem is how to read in colData in the first place! 

 

Is it possible to post your successful code in full here? 

 

Thank you,

Alice

ADD COMMENT
0
Entering edit mode

hi Alice,

You can post new questions with the "Ask question" button at the top. Your question here is posted as an answer to Hanni's question.

See this section of our DESeq2 workflow (the workflow goes at a slower pace than the vignette, explaining some base R functions as well):

"Typically, we have a table with experimental metadata for our samples. For your own project, you might create such a comma-separated value (CSV) file using a text editor or spreadsheet software such as Excel.

We load this file with read.csv. The parentheses around the last line are used to print the result in addition to storing it to the sampleTable object."

http://bioconductor.org/help/workflows/rnaseqGene/

If you didn't use summarizeOverlaps or htseq-count (i.e. you've already constructed a count matrix) you'll have to make sure that the column names of the counts matrix line up with the row names of the imported sample table.

ADD REPLY
0
Entering edit mode
hwillenbrock ▴ 20
@hwillenbrock-6937
Last seen 10.0 years ago
Denmark

Hi Michael,

One of my donor samples might be an outlier and I tried removing the samples from the analysis. This means that I now have the following samples:

 

    tissue culture    treatment culture.nested
 1  tissueX       1 unstimulated              1
 2  tissueX       1   stimulated              1
 3  tissueX       2 unstimulated              2
 4  tissueX       2   stimulated              2
 5  tissueX       3 unstimulated              3
 6  tissueX       3   stimulated              3
 7  tissueX       4 unstimulated              4
 8  tissueX       4   stimulated              4
 9  tissueX       5 unstimulated              5
 10 tissueX       5   stimulated              5
 11 tissueY       6 unstimulated              1
 12 tissueY       6   stimulated              1
 13 tissueY       7 unstimulated              2
 14 tissueY       7   stimulated              2
 15 tissueY       8 unstimulated              3
 16 tissueY       8   stimulated              3
 19 tissueY      10 unstimulated              5
 20 tissueY      10   stimulated              5

Now, when I try to do this:

design(dds) = ~ tissue + tissue:culture.nested + tissue:treatment

I get the error: "the model matrix is not full rank, so the model cannot be fit as specified." 

Is there any way to do the analysis, still exploiting the paired design without having to remove the one of the tissueX cultures as well?

 

Thanks,

Hanni

 

 

ADD COMMENT
0
Entering edit mode

hi Hanni,

Unfortunately, the base R function tries to form an interaction between tissue and all the non-base levels of culture.nested, leaving you with an interaction effect tissueY.culture.nested4 and no samples to fit that effect. I am planning to build more support for custom design matrices this devel cycle, for the cases like these where handing a formula to model.matrix() is not sufficient. You can use the workaround I propose at the link below, where you edit the design matrix to remove the column which cannot be fit:

C: DESeq2 paired and multi factor comparison

ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6