edgeR: shall I fit the glm model separately for different cell lines?
1
0
Entering edit mode
@cafelumiere12-7513
Last seen 6.9 years ago
United States

i all,

I have RNAseq data for treated and untreated samples (in triplicates)  in three different cell lines: KO1, KO2, WT. The goal is to do three comparisons:

(1) KO1 treated vs. untreated (2) KO2 treated vs. untreated and (3) WT treated vs untreated

Here is the group information in a data.frame called "sampleInfo" :

> sampleInfo

  Group
sample1 KO_2.Treated
sample2 KO_2.Treated
sample3 KO_2.Treated
sample4 KO_2.Untreated
sample5 KO_2.Untreated
sample6 KO_2.Untreated
sample7 KO_1.Treated
sample8 KO_1.Treated
sample9 KO_1.Treated
sample10 KO_1.Untreated
sample11 KO_1.Untreated
sample12 KO_1.Untreated
sample13 WT.Treated
sample14 WT.Treated
sample15 WT.Treated
sample16 WT.Untreated
sample17 WT.Untreated
sample18 WT.Untreated

 

I have a counts results data frame that has 18 columns corresponding to the gene counts of the above 18 samples.
 

My question is, does it make sense if I combine all the data together (18 samples total), make constrasts that specifies the three different comparisons I want to make, fit them through glimFit, and then calculate the three different contrasts separately as below:

## Construct DGEList
d <- DGEList(counts=counts)

## Make design matrix
Group = factor(sampleInfo$Group)
design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
rownames(design) <- colnames(counts)

## Make contrasts
prestr <-"my.contrasts = makeContrasts("
mainStr <- paste("KO_2.Treated_vs_Untreated=KO_2.Treated-KO_2.Untreated,",
                 "KO_1.Treated_vs_Untreated=KO_1.Treated-KO_1.Untreated,",
                 "WT.Treated_vs_Untreated=WT.Treated-WT.Untreated",sep="")
poststr <-",levels=design)"
commandstr=paste(prestr,mainStr,poststr,sep="")
eval(parse(text=commandstr))

# annotationTable = read.csv(annotationsFile, row.names=1)

#################
##  Filtering  ##
#################
d <- calcNormFactors(d)
d <- estimateGLMCommonDisp(d, design)
d <- estimateGLMTagwiseDisp(d, design)
fit <- glmFit(d, design)

Or, do I need to do the glmFit separately?

Thanks very much in advance!

edger • 955 views
ADD COMMENT
3
Entering edit mode
@ryan-c-thompson-5618
Last seen 3 months ago
Icahn School of Medicine at Mount Sinai…

You can certainly fit a model to all your samples and test all three contrasts from the common model, and this is generally the recommended way to run edgeR. More samples in a model means more degrees of freedom to estimate the dispersion. The only reason you wouldn't do this is if you knew that the different cell lines had different dispersions, and you needed to estimate a separate dispersion for each cell line. In this case you would probably be best off fitting a single model using voomWithQualityWeights from the limma package, which can fit a single model with groups having different variances.

By the way, there's absolutely no reason to use eval to construct your contrasts. You're just making things extra complicated for yourself. Just write the code:

my.contrasts = makeContrasts(KO_2.Treated_vs_Untreated=KO_2.Treated-KO_2.Untreated,
                             KO_1.Treated_vs_Untreated=KO_1.Treated-KO_1.Untreated,
                             WT.Treated_vs_Untreated=WT.Treated-WT.Untreated,
                             levels=design)

If you have your contrast expressions as a character vector, you can use this instead:

my.contrast.expressions = c(KO_2.Treated_vs_Untreated="KO_2.Treated-KO_2.Untreated",
                            KO_1.Treated_vs_Untreated="KO_1.Treated-KO_1.Untreated",
                            WT.Treated_vs_Untreated="WT.Treated-WT.Untreated")
my.contrasts <- makeContrasts(contrasts=my.contrast.expressions, levels=design)

 

ADD COMMENT
0
Entering edit mode

Thank you so much! This is super helpful. Good point about the unnecessary complications there - it was sort of copy paste from my previous project where the contrasts were a lot more complicated & should have changed it. Thanks a lot!! 

 

ADD REPLY
0
Entering edit mode

Indeed, we find that the extra precision from more residual d.f. outweighs any loss of performance due to bias from using a common dispersion for all groups. Also, just to elaborate on voomWithQualityWeights; if you want to model group-specific variances, you have to specify the design matrix in the var.design argument, as well as in design.

ADD REPLY

Login before adding your answer.

Traffic: 885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6