i all,
I have RNAseq data for treated and untreated samples (in triplicates) in three different cell lines: KO1, KO2, WT. The goal is to do three comparisons:
(1) KO1 treated vs. untreated (2) KO2 treated vs. untreated and (3) WT treated vs untreated
Here is the group information in a data.frame called "sampleInfo" :
> sampleInfo
Group | |
sample1 | KO_2.Treated |
sample2 | KO_2.Treated |
sample3 | KO_2.Treated |
sample4 | KO_2.Untreated |
sample5 | KO_2.Untreated |
sample6 | KO_2.Untreated |
sample7 | KO_1.Treated |
sample8 | KO_1.Treated |
sample9 | KO_1.Treated |
sample10 | KO_1.Untreated |
sample11 | KO_1.Untreated |
sample12 | KO_1.Untreated |
sample13 | WT.Treated |
sample14 | WT.Treated |
sample15 | WT.Treated |
sample16 | WT.Untreated |
sample17 | WT.Untreated |
sample18 | WT.Untreated |
I have a counts results data frame that has 18 columns corresponding to the gene counts of the above 18 samples.
My question is, does it make sense if I combine all the data together (18 samples total), make constrasts that specifies the three different comparisons I want to make, fit them through glimFit, and then calculate the three different contrasts separately as below:
## Construct DGEList d <- DGEList(counts=counts) ## Make design matrix Group = factor(sampleInfo$Group) design <- model.matrix(~0+Group) colnames(design) <- levels(Group) rownames(design) <- colnames(counts) ## Make contrasts prestr <-"my.contrasts = makeContrasts(" mainStr <- paste("KO_2.Treated_vs_Untreated=KO_2.Treated-KO_2.Untreated,", "KO_1.Treated_vs_Untreated=KO_1.Treated-KO_1.Untreated,", "WT.Treated_vs_Untreated=WT.Treated-WT.Untreated",sep="") poststr <-",levels=design)" commandstr=paste(prestr,mainStr,poststr,sep="") eval(parse(text=commandstr)) # annotationTable = read.csv(annotationsFile, row.names=1) ################# ## Filtering ## ################# d <- calcNormFactors(d) d <- estimateGLMCommonDisp(d, design) d <- estimateGLMTagwiseDisp(d, design) fit <- glmFit(d, design)
Or, do I need to do the glmFit separately?
Thanks very much in advance!
Thank you so much! This is super helpful. Good point about the unnecessary complications there - it was sort of copy paste from my previous project where the contrasts were a lot more complicated & should have changed it. Thanks a lot!!
Indeed, we find that the extra precision from more residual d.f. outweighs any loss of performance due to bias from using a common dispersion for all groups. Also, just to elaborate on
voomWithQualityWeights
; if you want to model group-specific variances, you have to specify the design matrix in thevar.design
argument, as well as indesign
.