DESeq2 multifactorial design with unpaired individuals
1
0
Entering edit mode
fanli.gcb • 0
@fanligcb-10919
Last seen 7.0 years ago
Los Angeles, CA

Hi all,

I have 12 RNA-seq samples that represent 4 different cell lines each done in triplicate, as below:

                           fastq Condition CellLine
64 64-C232_3_S11_R1_001.fastq.gz   Control       C1
73 73-C232_2_S10_R1_001.fastq.gz   Control       C1
84  84-C232_1_S4_R1_001.fastq.gz   Control       C1
65  65-C229_2_S3_R1_001.fastq.gz   Control       C2
76 76-C229_3_S12_R1_001.fastq.gz   Control       C2
81  81-C229_3_S5_R1_001.fastq.gz   Control       C2
62   62-232_3_S1_R1_001.fastq.gz   Disease       D1
63   63-232_3_S2_R1_001.fastq.gz   Disease       D1
89   89-232_1_S7_R1_001.fastq.gz   Disease       D1
71   71-348_4_S8_R1_001.fastq.gz   Disease       D2
86   86-348_3_S6_R1_001.fastq.gz   Disease       D2
88   88-348_4_S9_R1_001.fastq.gz   Disease       D2

So, C1, C2, D1, and D2 are all independent cell lines, and really I'd like to know what is different in Control vs Disease. Of note, C1/D1 and C2/D2 are not paired cell lines (e.g. healthy and disease derived from the same individual). As such, the straightforward design of ~ CellLine + Condition gives the model matrix not full rank error. I can, however, use a custom design matrix as suggested in the vignette as follows:

dds <- DESeqDataSetFromMatrix(countData=counts, colData=mapping, design= ~1)
dds$indn <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2))
mm1 <- model.matrix(~ Condition + Condition:indn, colData(dds))
> mm1
   (Intercept) ConditionDisease ConditionControl:indn2 ConditionDisease:indn2
64           1                0                      0                      0
73           1                0                      0                      0
84           1                0                      0                      0
65           1                0                      1                      0
76           1                0                      1                      0
81           1                0                      1                      0
62           1                1                      0                      0
63           1                1                      0                      0
89           1                1                      0                      0
71           1                1                      0                      1
86           1                1                      0                      1
88           1                1                      0                      1

 

I then extract the results as follows:

> resultsNames(dds)
[1] "Intercept"              "ConditionDisease"       "ConditionControl.indn2"
[4] "ConditionDisease.indn2"

> res <- results(dds, contrast=c(0,1,0,0))

Is this the correct way to extract the Condition effect while controlling for the fact that I have triplicate samples from different cell lines? I've looked through the various posts here/Biostars/seqanswers but can't seem to find this exact situation. I suppose we should have used paired cell lines, right? Thanks in advance for any help!

 

deseq2 design matrix • 1.4k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 2 days ago
United States

When you include an interaction term, the main effect ConditionDisease becomes the comparison of D1 vs C1. 

The problem with the design is that you can't use fixed effects to control for cell line, and then to test the condition, because these variables are perfectly confounded. So you can't attempt this with DESeq2 or other packages that only offer fixed effects modeling.

You have to take an alternate approach with such a design, if you want to control for cell line, which is to inform the model that there are correlations within cell lines. A package which let's you do this is limma, with the duplicateCorrelation() function. So you should look up the voom and limma workflow for analyzing RNA-seq data, and then look up the duplicateCorrelation function.

ADD COMMENT
0
Entering edit mode

Thanks for the answer Michael. I suppose in the future I should ask this group to use paired cell lines if possible. 

ADD REPLY

Login before adding your answer.

Traffic: 697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6