edgeR: Strange volcano plot
1
0
Entering edit mode
cronanz • 0
@cronanz-12047
Last seen 5.5 years ago

Dear all,

I have a scRNA-seq data (plate-based) and to identify differentially expressed genes between clusters, I have made use of edgeR. The input data was expected counts from RSEM and the example workflow is as follows:

all_edger <- DGEList(counts=all_expc,group=groups)
all_edger <- calcNormFactors(all_edger,method="TMMwzp")
all_design <- model.matrix(~0+groups)
all_edger <- estimateDisp(all_edger,design=all_design)
all_fit <- glmFit(all_edger,all_design)
all_lrt <- glmLRT(all_fit,constrast=c(-1,0,0,0,0,1,0,0))

The resulting volcano plot from the above comparison has a pattern that I'm not familiar with. Supposedly there is a tight correlation between logFC and -log10(FDR) for certain genes that resulted in a line of genes from each side of the plot. I guess my understanding is limited such that I'm unable to interpret this pattern. Is this to be expected? Am I doing something out of norm that results in this? Thank you very much.

Volcano plot: https://ibb.co/JcMnK7r

edger scRNA-seq DGEA • 2.5k views
ADD COMMENT
1
Entering edit mode

Generally one plots the negative log10 of the nominal p-value

ADD REPLY
1
Entering edit mode

Here are a couple of posts explaining why -log10(p) is better than -log10(FDR) for the volcano plot (as noted by Kevin Blighe):

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

It's hard to say for sure, but I would guess that you have a few genes that are all-zero in one group and with some non-zero counts in the other group. If you hold the dispersion constant (e.g., if all of the genes have very similar abundances), the p-value will be a monotonic function of the log-fold change, resulting in the lines that you've observed. It may even be that the non-zero counts in each group come from the same cells - or even just a single cell - which contributes to the clear definition of the pattern on the volcano plot.

I would suggest having a closer look at a few of those genes (in terms of their expression profiles across groups, e.g., with scater::plotExpression) for further diagnostics. Such patterns are not necessarily a problem - the counts are low, after all - though you are correct in that they do warrant some level of concern and investigation.

ADD COMMENT

Login before adding your answer.

Traffic: 815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6