Hi,
I am writing a post because I encountered an interesting question when using DESeq2. I want to analyze RNA-seq data gained from samples of two groups: control and treatment (inhibitor of a kinase). I want to perform differential expression analysis and find out what the main role of this kinase is. I know many people would do it based on fold change. However, I noticed that some genes have a very small baseMean, though they have a large fold change. I am more interested in the differential counts of genes than the fold change because some genes that have a large baseMean might have a huge influence on the cell even without a large fold change. Also, in cells, living actually means a lot of chemical reactions going on. Suppose we have a molecule with a very large number in a cell, its number might be prevented from continuing to grow because of the regulatory networks within the cell - or if its number dropped, say four-fold in the cell, the cell would have died, which would make it harder to achieve as large a fold change as a molecule with a very small BaseMean, but this molecule is important. I tried to use baseMean * (2^(log2FoldChange) - 1) as an approximation of the count change, however, the baseMean would be the same for a gene even if the reference group changes, which would make my approximation less accurate when the baseMean becomes larger. But if there is a fold change, this software has estimated the mean of the control group, the mean of the treatment group, and the differential count. So I am writing to ask about the possibility of reporting these numbers in DESeq2. I believe it would help a lot of downstream lab work and benefit future biomedical discoveries. I have also attached key points about my current work in case you are interested and want to understand more about my question.
Best regards,
Yongqing Miao
key points of my work - Understanding the Role of Dual Leucine Zipper Kinase (DLK) in Neurotrophin Deprivation
- Analyzed RNA-seq data (30 million of single end 50 base pair reads per library) of interest (14 samples) from GSE95672 on the GEO webpage (cell: mice dorsal root ganglions, control: nerve growth factor (NGF) + dimethyl sulfoxide (DMSO), treatment 1: anti-NGF + DMSO, treatment 2: anti-NGF + DLK inhibitor (DLKi), culture time: 4 days)
- Used FastQC to perform quality control for fastq files, found the sequence duplication levels were high, and deduplicated fastq files with czid-dedup
- Used HISAT2 for allignment with GRCm38 as the reference genome and used FeatureCounts to summarize counts for genes
- Used Bayesian method based on mRNA counts provided by DESeq2 for differential expression analysis
- Set gene set 1 as genes whose expressions changed in the control group compared to treatment group 1, set gene set 2 as genes whose expressions changed in treatment group 2 compared to treatment group 1, and took the intersection of the two gene sets for pathway analysis with 0.025 as the false discovery rate for both gene sets
- Used baseMean * (2^{log2FoldChange} - 1) as an approximation of expression level change for significant genes and included genes that had more than 4000 approximated differential expression counts between treatment 1 and treatment 2 for annotation and pathway analysis
- Found that in neurotrophin deprivation, DLK may mainly lead to reduced mRNA levels of enzymes in lipid synthesis as well as reduced mRNA levels of tubulin (especially class 3), NGF receptors, and proteins that relate to axon and synapse (all genes mentioned above have more than 10000 approximated count change)
- Found that in neurotrophin deprivation, DLK may mainly lead to increased mRNA levels of stathmin-like 4, collagens, heat shock protein 5, transcription factor Jun, caspase 3, BCL2 binding component 3, and chloride channel voltage-sensitive 3 (all genes mentioned above have more than 4500 approximated count change)
- Performed pathway analysis in DAVID and STRING and indicated a dual role of DLK - it might help the neurons survive in short-term neurotrophin deprivation, but also make some preparation for the neurons' apoptosis
Hi, thank you very much for your reply. I understand that DESeq2 uses proper statistical tests to explore differential expression data. My point is that if DESeq2 can report a "fold change", it should have estimated two values for counts (for example, control: 20 counts, treatment: 80 counts, so that fold change would be: 4, log2FC: 2). I am here to suggest to report these 2 numbers (i.e. 20 and 80 in my example, or it can also report 80-20 = 60, which is what I refer as "differential counts" between the 2 groups).
Adding a link to Gordon's answer (also relevant for the linear model explanation)
suggest to report differential estimate in edgeR
Not all linear models have group means that are used to derive the log fold change.
For example, even
~batch + condition
, you cannot compute the LFC just by looking at two average counts.Hi Mike, thank you very much for your reply. I have one more question: when doing DE with DESeq2, would p-value be preferable to LFC?