I have several questions about tximport results used for edgeR.
According to the tximport vignette, the ideal method is to provide the estimated counts from the default condition (countsFromAbundance with "no") combined with an offset that corrects for changes to the average transcript length across samples for edgeR analysis. Your example of creating a DGEList for use with edgeR is as follows:
library(edgeR) cts <- txi$counts normMat <- txi$length normMat <- normMat/exp(rowMeans(log(normMat))) library(edgeR) o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat)) y <- DGEList(cts) y$offset <- t(t(log(normMat)) + o) # y is now ready for estimate dispersion functions see edgeR User's Guide
A basic edgeR analysis procedure is listed below:
y <- DGEList(counts=..., gene=..., group=...) keep <- rowSums(cpm(y)>...) >= ... y <- y[keep, , keep.lib.sizes=FALSE] y <- calcNormFactors(y) design <- model.matrix(...) y <- estimateDisp(y, design, robust=TRUE) fit <- glmQLFit(y, design, robust=TRUE), or et <- exactTest(y, pair=...)
Q1: How to incorporate y (with offset) into the edgeR analysis procedure, namely, which step in the edgeR is followed by y (with offset)? Is y (with offset) directly used for this step “y <- estimateDisp(y, design, robust=TRUE)”?
If so, whether no need to use library size (y <- calcNormFactors(y)) for further normalization to y (with offset).
Q2: I want to know which step in the edgeR analysis procedure use the offset information to correct final results. It seems that the edgeR's cpm function doesn't use it.
Q3: If countsFromAbundance="lengthScaledTPM" is used to generate the scaled counts, whether this step (y <- calcNormFactors(y)) in the edgeR can be omitted because these counts have been scaled using the average transcript length, averaged over samples and to library size in the tximport.
Thank you very much! There are two additional questions. 1. If the classic edgeR approach is used to make pairwise comparisons between the groups, are the offsets automatically used by exactTest()? 2. If I want to use cpm or logcpm for clustering and heatmap, how to obtain the corrected cpm or logcpm by the offsets. Thanks in advance.
No, offsets are not used by exactTest(). Offsets are only used by the glm-based functions.