Dear all.
I am trying to get the set of differentially expressed genes between several breast cancer subtypes. To this end, I am using an ANOVA-like test from the edgeR package. I have several questions about it. The code looks like that:
# The conditions as a factor
head(groups)
[1] Basal-like Basal-like Basal-like Basal-like Basal-like Basal-like
Levels: Basal-like HER2-enriched Luminal A Luminal B
dge <- DGEList(tcga, group = groups) # tcga is a data.frame containing gene names as rownames and sample labels as colnames.
keep <- filterByExpr(dge)
dge <- dge[keep, , keep.lib.sizes = FALSE]
1) By setting keep.lib.sizes to FALSE in dge <- dge[keep, , keep.lib.sizes = FALSE] the library sizes are recalculated, right?
dge.n <- calcNormFactors(dge) # Calc norm factors
design <- model.matrix( ~ groups)
colnames(design) <- c("basal", "her2", "luma", "lumb") # change colnames of design matrix
head(design)
basal her2 luma lumb
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 1 0 0 0
6 1 0 0 0
dge.c <- estimateCommonDisp(dge.n) # Common disp
dge.t <- estimateTagwiseDisp(dge.c) # Tagwise disp
fit <- glmQLFit(dge.t, design, robust = TRUE)
qlf <- glmQLFTest(fit, coef = 2:4)
out <- topTags(qlf, n = "Inf")$table
out <- out[out$FDR <= 0.05,]
2) It is correct to give dge.t to glmQLFit() or it is not necessary for this analysis?
3) If I want to filter the genes in out by an additional logFC threshold, do I have to use glmTreat()? In that case, how should I use the function to retrieve all the genes that meet a certain logFC threshold between at least two conditions (for example, 0.5849625 for fold change of 1.5) and are differentially expressed?
Thanks in advance.
Dear Gordon, thank you so much for your response.
I misunderstood a part of the manual and thought that the tagwise dispersion was recommended.
I have made a series of changes to use the trended dispersion and finally, get the set of all genes differentially expressed between the conditions with a fold change greater than 1.5. The code for this purpose is shown below:
With this, the final set of genes differentially expressed at a fold change of 1.5 is given by each contrast. Is there any way to do this with coef while using a subtype as the intercept? Again, thanks for your help.
There is no way to combine fold-change thresholding with an F-test. Fold-changes can only be associated with individual t-tests. So if you want to use fold-changes, there is no alternative but make pairwise comparisons, as you have done.
I see, thank you so much for your help Gordon.