Question

DESeq2 Factorial Design correcting for Batch + Lane + RIN

0

Entering edit mode

choishingwan • 0

@choishingwan-9327

Last seen 9.2 years ago

Hong Kong

So recently we have got sample back with the following design

Condition	Treatment	Lane	Batch	RIN
Case	Treated	1	1	7.7
Case	Treated	2	1	7.7
Case	Treated	1	4	7.6
Control	Treated	2	4	8.1
Control	Treated	1	1	7.8
Control	Treated	2	1	7.7
Case	Untreated	1	4	7.5
Case	Untreated	2	3	7.8
Case	Untreated	2	2	7.9
Control	Untreated	2	1	7.4
Control	Untreated	1	3	8
Control	Untreated	1	2	7.8

data.frame(Condition=rep(rep(c("Case","Control"), each=3),2), Treatment=rep(c("Treated", "Untreated"),each=6), Lane=c(1,2,1,2,1,2,1,2,2,2,1,1), Batch=c(1,1,4,4,1,1,4,3,2,1,3,2), RIN=c(7.7,7.7,7.6,8.1,7.8,7.7,7.5,7.8,7.9,7.4,8,7.8))

What we would like to do is to:

1. Compare effect of treatment in case

2. Compare effect of treatment in control

3. Compare and contrast the effect of treatment in case when compared to control.

From previous answers, it seems like there are two different model that we can use:

1. ~Batch+Lane+RIN+Condition+Condition:Treatment

2. ~Batch+Lane+RIN+Condition+Group

Where group is <Treatment><Condition>

Take for example, when we try to compare the difference of condition when there is no treatment

e.g. Untreated Case vs Untreated Control

For 1, we use

dds = DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~Batch+RIN+Lane+Condition+Condition:Treatment)
dds$Condition = relevel(dds$Condition, ref="Control")
dds$Treatment = relevel(dds$Treatment, ref="Untreated")
dds = DESeq(dds)
results(dds, contrast=c("Condition", "Control", "Case"))

For 2 we use

coldata$Group = paste0(coldata$Treatment, "-", coldata$Condition)
dds = DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~Batch+RIN+Lane+Group)
dds = DESeq(dds)
results(dds, contrast=c("Group", "Untreated-Control", "Untreated-Case"))

However, I found that the two design actually give very different results.

Using the interaction term model, we will get around 438 genes with padj < 0.05 whereas none of the genes can anywhere close to significance when we use the group model.

Is there something that I did wrongly? Which one is the more appropriate method? And most importantly, why was there such a large difference?

deseq2 • 3.4k views

ADD COMMENT • link updated 9.2 years ago by Michael Love 43k • written 9.2 years ago by choishingwan • 0

score 4 · Answer 1 · 2015-12-06

4

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 hours ago

United States

It's hard to say what's going on. Can you post the version number as well? If you haven't yet, make sure to update to the latest version 1.10.

I wouldn't ever add RIN to the design. This doesn't make sense in terms of the model and may be causing problems.

ADD COMMENT • link 9.2 years ago Michael Love 43k

0

Entering edit mode

Thank you Michael,

I am currently using DESeq2 version 1.10.0. I was adding the the RIN just to see if the RNA degradation might affect the results. I have tried to get the correlation of the p-value (which might not be correct but was only try to get an idea) of all possible combinations and the results from model 1 and model 2 doesn't agrees (the correlation file is here). So I am uncertain whether if there is some problem with my scripts...

(Forgot to mention, the rows with Interact as prefix are those that uses the interaction model).

Below is my R sessionInfo():

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_Hong Kong SAR.1252  LC_CTYPE=English_Hong Kong SAR.1252    LC_MONETARY=English_Hong Kong SAR.1252
[4] LC_NUMERIC=C                           LC_TIME=English_Hong Kong SAR.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.10.0              RcppArmadillo_0.6.300.2.0  Rcpp_0.12.2                SummarizedExperiment_1.0.1
 [5] Biobase_2.30.0             GenomicRanges_1.22.1       GenomeInfoDb_1.6.1         IRanges_2.4.4             
 [9] S4Vectors_0.8.3            BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.10.0       futile.options_1.0.0
 [6] tools_3.2.2          zlibbioc_1.16.0      rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0       
[11] annotate_1.48.0      gtable_0.1.2         lattice_0.20-33      DBI_0.3.1            proto_0.3-10        
[16] gridExtra_2.0.0      genefilter_1.52.0    cluster_2.0.3        stringr_1.0.0        locfit_1.5-9.1      
[21] nnet_7.3-10          grid_3.2.2           AnnotationDbi_1.32.0 XML_3.98-1.3         survival_2.38-3     
[26] BiocParallel_1.4.0   foreign_0.8-65       latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.48.0  
[31] ggplot2_1.0.1        reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5         scales_0.3.0        
[36] Hmisc_3.17-0         MASS_7.3-43          splines_3.2.2        xtable_1.8-0         colorspace_1.2-6    
[41] stringi_1.0-1        acepack_1.3-3.3      munsell_0.4.2

ADD REPLY • link 9.2 years ago choishingwan • 0

0

Entering edit mode

Repeat the analysis but definitely do not include RIN in the design. Then compare 1 and 2.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

It is slightly better in that both doesn't return any significance, however, the p-value still differs in that their correlation is only around 0.6827998

Really thank you for your help Michael

ADD REPLY • link 9.2 years ago choishingwan • 0

0

Entering edit mode

Correlation on p-values isn't a great comparison. Better is a cross tabulation:

table(res1$padj < .1, res2$padj < .1)

Remember though, that they are not identical methods so the results are not identical (2 uses LFC shrinkage and the 1 does not). My advice is if you are interested in testing the interaction term use 1, if you are interested in comparing groups use 2.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

Thank you Michael. None of them return any significance so I tried to use pvalue < 0.05 instead

		Model 2
		False	True
Model1	False	29994	0
	True	20	51

If that is normal, maybe I will stick with model 2 just because it is much simpler to understand. Thanks again for your timely feedback!

ADD REPLY • link 9.2 years ago choishingwan • 0