I analyzed a single-cell RNA-seq dataset including two groups using Seurat and then DESeq2 for differential expression. No problems with the standard workflow. However, the resulting log2FoldChanges for treated vs. untreated groups within clusters are almost all negative. This is clear in the corresponding volcano plots, which are not "centered" at a log2FoldChange of ~zero, but rather at about -0.5 (although still with a classic volcano shape). I know I can use "estimateSizeFactors" with "controlGenes" in DESeq2, but what is the most likely upstream explanation for this? It seems like a problem with normalization or scaling(?).
Agree with ATpoint
Depending on the "groups" in single cell, the question of DE can be not well defined without inserting additional information (
controlGenes
). E.g. what defines LFC = 0 across disparate cell types with very different distribution across the transcriptome.Thank you both. The code is below, simplified a bit for simplicity. ATpoint, your suggestion re: the top 20% of genes by baseMean makes me wonder if the shift in LFC=0 is an artifact of the experiment, which should include an enrichment for certain RNA types in the treated group (grey points in the volcano plot). If these are robustly increased in most cells whereas most other genes in the single-cell data are non-uniformly expressed, perhaps they are driving the normalization?
My comment was generic. If your experiment causes drastic shifts then the situation might be different. The advise is the same: Try to use genes you think are non-DE as controlGenes. For more guidance I would need to actually see the data and do inspection myself.