Question

edgeR: Strange volcano plot

0

Entering edit mode

cronanz • 0

@cronanz-12047

Last seen 5.8 years ago

Dear all,

I have a scRNA-seq data (plate-based) and to identify differentially expressed genes between clusters, I have made use of edgeR. The input data was expected counts from RSEM and the example workflow is as follows:

all_edger <- DGEList(counts=all_expc,group=groups)
all_edger <- calcNormFactors(all_edger,method="TMMwzp")
all_design <- model.matrix(~0+groups)
all_edger <- estimateDisp(all_edger,design=all_design)
all_fit <- glmFit(all_edger,all_design)
all_lrt <- glmLRT(all_fit,constrast=c(-1,0,0,0,0,1,0,0))

The resulting volcano plot from the above comparison has a pattern that I'm not familiar with. Supposedly there is a tight correlation between logFC and -log10(FDR) for certain genes that resulted in a line of genes from each side of the plot. I guess my understanding is limited such that I'm unable to interpret this pattern. Is this to be expected? Am I doing something out of norm that results in this? Thank you very much.

Volcano plot: https://ibb.co/JcMnK7r

edger scRNA-seq DGEA • 2.6k views

ADD COMMENT • link updated 5.8 years ago by Aaron Lun ★ 28k • written 5.8 years ago by cronanz • 0

1

Entering edit mode

Generally one plots the negative log10 of the nominal p-value

ADD REPLY • link 5.8 years ago Kevin Blighe ★ 4.0k

1

Entering edit mode

Here are a couple of posts explaining why -log10(p) is better than -log10(FDR) for the volcano plot (as noted by Kevin Blighe):

ADD REPLY • link 5.8 years ago Gordon Smyth 52k

score 1 · Answer 1 · 2019-05-03

It's hard to say for sure, but I would guess that you have a few genes that are all-zero in one group and with some non-zero counts in the other group. If you hold the dispersion constant (e.g., if all of the genes have very similar abundances), the p-value will be a monotonic function of the log-fold change, resulting in the lines that you've observed. It may even be that the non-zero counts in each group come from the same cells - or even just a single cell - which contributes to the clear definition of the pattern on the volcano plot.

I would suggest having a closer look at a few of those genes (in terms of their expression profiles across groups, e.g., with scater::plotExpression) for further diagnostics. Such patterns are not necessarily a problem - the counts are low, after all - though you are correct in that they do warrant some level of concern and investigation.