Dear all,
I have a scRNA-seq data (plate-based) and to identify differentially expressed genes between clusters, I have made use of edgeR. The input data was expected counts from RSEM and the example workflow is as follows:
all_edger <- DGEList(counts=all_expc,group=groups)
all_edger <- calcNormFactors(all_edger,method="TMMwzp")
all_design <- model.matrix(~0+groups)
all_edger <- estimateDisp(all_edger,design=all_design)
all_fit <- glmFit(all_edger,all_design)
all_lrt <- glmLRT(all_fit,constrast=c(-1,0,0,0,0,1,0,0))
The resulting volcano plot from the above comparison has a pattern that I'm not familiar with. Supposedly there is a tight correlation between logFC and -log10(FDR) for certain genes that resulted in a line of genes from each side of the plot. I guess my understanding is limited such that I'm unable to interpret this pattern. Is this to be expected? Am I doing something out of norm that results in this? Thank you very much.
Volcano plot: https://ibb.co/JcMnK7r
Generally one plots the negative log10 of the nominal p-value
Here are a couple of posts explaining why -log10(p) is better than -log10(FDR) for the volcano plot (as noted by Kevin Blighe):