Question

edgeR: shall I fit the glm model separately for different cell lines?

0

Entering edit mode

cafelumiere12 ▴ 20

@cafelumiere12-7513

Last seen 7.1 years ago

United States

i all,

I have RNAseq data for treated and untreated samples (in triplicates) in three different cell lines: KO1, KO2, WT. The goal is to do three comparisons:

(1) KO1 treated vs. untreated (2) KO2 treated vs. untreated and (3) WT treated vs untreated

Here is the group information in a data.frame called "sampleInfo" :

> sampleInfo

	Group
sample1	KO_2.Treated
sample2	KO_2.Treated
sample3	KO_2.Treated
sample4	KO_2.Untreated
sample5	KO_2.Untreated
sample6	KO_2.Untreated
sample7	KO_1.Treated
sample8	KO_1.Treated
sample9	KO_1.Treated
sample10	KO_1.Untreated
sample11	KO_1.Untreated
sample12	KO_1.Untreated
sample13	WT.Treated
sample14	WT.Treated
sample15	WT.Treated
sample16	WT.Untreated
sample17	WT.Untreated
sample18	WT.Untreated

I have a counts results data frame that has 18 columns corresponding to the gene counts of the above 18 samples.

My question is, does it make sense if I combine all the data together (18 samples total), make constrasts that specifies the three different comparisons I want to make, fit them through glimFit, and then calculate the three different contrasts separately as below:

## Construct DGEList
d <- DGEList(counts=counts)

## Make design matrix
Group = factor(sampleInfo$Group)
design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
rownames(design) <- colnames(counts)

## Make contrasts
prestr <-"my.contrasts = makeContrasts("
mainStr <- paste("KO_2.Treated_vs_Untreated=KO_2.Treated-KO_2.Untreated,",
                 "KO_1.Treated_vs_Untreated=KO_1.Treated-KO_1.Untreated,",
                 "WT.Treated_vs_Untreated=WT.Treated-WT.Untreated",sep="")
poststr <-",levels=design)"
commandstr=paste(prestr,mainStr,poststr,sep="")
eval(parse(text=commandstr))

# annotationTable = read.csv(annotationsFile, row.names=1)

#################
##  Filtering  ##
#################
d <- calcNormFactors(d)
d <- estimateGLMCommonDisp(d, design)
d <- estimateGLMTagwiseDisp(d, design)
fit <- glmFit(d, design)

Or, do I need to do the glmFit separately?

Thanks very much in advance!

edger • 1.0k views

ADD COMMENT • link updated 9.0 years ago by Ryan C. Thompson ★ 7.9k • written 9.0 years ago by cafelumiere12 ▴ 20

score 3 · Accepted Answer · 2016-05-02

You can certainly fit a model to all your samples and test all three contrasts from the common model, and this is generally the recommended way to run edgeR. More samples in a model means more degrees of freedom to estimate the dispersion. The only reason you wouldn't do this is if you knew that the different cell lines had different dispersions, and you needed to estimate a separate dispersion for each cell line. In this case you would probably be best off fitting a single model using voomWithQualityWeights from the limma package, which can fit a single model with groups having different variances.

By the way, there's absolutely no reason to use eval to construct your contrasts. You're just making things extra complicated for yourself. Just write the code:

my.contrasts = makeContrasts(KO_2.Treated_vs_Untreated=KO_2.Treated-KO_2.Untreated,
                             KO_1.Treated_vs_Untreated=KO_1.Treated-KO_1.Untreated,
                             WT.Treated_vs_Untreated=WT.Treated-WT.Untreated,
                             levels=design)

If you have your contrast expressions as a character vector, you can use this instead:

my.contrast.expressions = c(KO_2.Treated_vs_Untreated="KO_2.Treated-KO_2.Untreated",
                            KO_1.Treated_vs_Untreated="KO_1.Treated-KO_1.Untreated",
                            WT.Treated_vs_Untreated="WT.Treated-WT.Untreated")
my.contrasts <- makeContrasts(contrasts=my.contrast.expressions, levels=design)