Pipeline for an Anova-like analysis in edgeR
1
0
Entering edit mode
Antonio M. • 0
@antonio-m-13072
Last seen 18 months ago
Spain

Dear all.

I am trying to get the set of differentially expressed genes between several breast cancer subtypes. To this end, I am using an ANOVA-like test from the edgeR package. I have several questions about it. The code looks like that:


# The conditions as a factor
head(groups)
[1] Basal-like Basal-like Basal-like Basal-like Basal-like Basal-like
Levels: Basal-like HER2-enriched Luminal A Luminal B

dge <- DGEList(tcga, group = groups) # tcga is a data.frame containing gene names as rownames and sample labels as colnames.

keep <- filterByExpr(dge)
dge <- dge[keep, , keep.lib.sizes = FALSE]

1) By setting keep.lib.sizes to FALSE in dge <- dge[keep, , keep.lib.sizes = FALSE] the library sizes are recalculated, right?

dge.n <- calcNormFactors(dge) # Calc norm factors
design <- model.matrix( ~ groups)
colnames(design) <- c("basal", "her2", "luma", "lumb") # change colnames of design matrix
head(design)
  basal her2 luma lumb
1     1    0    0    0
2     1    0    0    0
3     1    0    0    0
4     1    0    0    0
5     1    0    0    0
6     1    0    0    0

dge.c <- estimateCommonDisp(dge.n) # Common disp
dge.t <- estimateTagwiseDisp(dge.c) # Tagwise disp

fit <- glmQLFit(dge.t, design, robust = TRUE)
qlf <- glmQLFTest(fit, coef = 2:4)

out <- topTags(qlf, n = "Inf")$table
out <- out[out$FDR <= 0.05,]

2) It is correct to give dge.t to glmQLFit() or it is not necessary for this analysis?

3) If I want to filter the genes in out by an additional logFC threshold, do I have to use glmTreat()? In that case, how should I use the function to retrieve all the genes that meet a certain logFC threshold between at least two conditions (for example, 0.5849625 for fold change of 1.5) and are differentially expressed?

Thanks in advance.

edgeR anova rna-seq • 1.7k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

The edgeR QL workflow illustrates a full pipeline:

https://www.bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

  1. Yes, see ?subsetting.
  2. You will get exactly the same results whether you input dge.c or dge.t to glmQLFit. We do recommend however inputing trended dispersion to glmQLFit. Is there some reason why you have avoided that?
  3. Yes, we recommend glmTreat. glmTreat is not the same as simply applying a logFC cutoff, which is something we strongly discourage, despite it being so popular in the literature.
ADD COMMENT
0
Entering edit mode

Dear Gordon, thank you so much for your response.

I misunderstood a part of the manual and thought that the tagwise dispersion was recommended.

I have made a series of changes to use the trended dispersion and finally, get the set of all genes differentially expressed between the conditions with a fold change greater than 1.5. The code for this purpose is shown below:

design <- model.matrix(~0+groups)
colnames(design) <- c("basal", "her2", "luma", "lumb")
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes = FALSE]
dge.n <- calcNormFactors(dge)
dge.disp <- estimateDisp(dge.n, design = design, robust = TRUE)
fit <- glmQLFit(dge.disp, design, robust = TRUE)


basher2 <- makeContrasts(basal-her2, levels=design)
basluma <- makeContrasts(basal-luma, levels=design)
baslumb <- makeContrasts(basal-lumb, levels=design)
her2luma <- makeContrasts(her2-luma, levels=design)
her2lumb <- makeContrasts(her2-lumb, levels=design)
lumalumb <- makeContrasts(luma-lumb, levels=design)

trbasher2 <- glmTreat(fit, contrast = basher2, lfc = log2(1.5))
trbasluma <- glmTreat(fit, contrast = basluma, lfc = log2(1.5))
trbaslumb <- glmTreat(fit, contrast = baslumb, lfc = log2(1.5))
trher2luma <- glmTreat(fit, contrast = her2luma, lfc = log2(1.5))
trher2lumb <- glmTreat(fit, contrast = her2lumb, lfc = log2(1.5))
trlumalumb <- glmTreat(fit, contrast = lumalumb, lfc = log2(1.5))

totalcontrasts <- list(trbasher2, trbasluma, trbaslumb, trher2luma, trher2lumb, trlumalumb)
genes <- c() # Vector with all genes differentially expressed
for(c in totalcontrasts){
    out <- topTags(c, n ="Inf")$table
    out <- out[which(out$FDR <= 0.05),]
    genes <- c(genes, rownames(out))
    genes <- unique(genes)
}

With this, the final set of genes differentially expressed at a fold change of 1.5 is given by each contrast. Is there any way to do this with coef while using a subtype as the intercept? Again, thanks for your help.

ADD REPLY
1
Entering edit mode

There is no way to combine fold-change thresholding with an F-test. Fold-changes can only be associated with individual t-tests. So if you want to use fold-changes, there is no alternative but make pairwise comparisons, as you have done.

ADD REPLY
0
Entering edit mode

I see, thank you so much for your help Gordon.

ADD REPLY

Login before adding your answer.

Traffic: 889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6