The drug effects are so drastic that the cell lines have more than 75% of the genes differentially expressed (3 control vs 3 treatment). I used the TMM normalization and EdgeR for the analysis, which assume a maximum of 60% of differentially expressed genes. How to proceed with such a case? Should I trust the results if the assumptions are violated?
Short answer: if the normalization assumptions are violated, you cannot trust the DE analysis results.
Generally, I would not consider the number of DE genes to be indicative of whether TMM normalization is okay or not. If you have a very high-powered experiment, you will detect many DE genes at a given significance threshold, even if most of them have near-zero log-fold changes and do not cause any meaningful violation of TMM's assumption. Conversely, failure to detect many DE genes does not mean that TMM normalization is suitable, given that the DE analysis already assumes that normalization was correct.
Your case is interesting in that you have detected many DE genes despite not having a particularly high-powered experimental design. This suggests that the log-fold changes of the ~15% of untrimmed DE genes are large - at least 0.5, perhaps? - which would affect the accuracy of the computed normalization factors. If you are concerned about DE, you could try increasing the trimming proportion to 80% (logratioTrim to 0.4). However, there is also the possibility that the effect of drug treatment is simply too drastic, and that the entire transcriptome is changing en masse, e.g., due to apoptosis. You would probably need spike-ins to have any chance of normalizing in this situation.
P.S. If this is an edgeR question, put an edgeR tag on your post.
Aaron, the logratioTrim is a parameter of edgeR I can change ?
The option of adding spike-ins is not duable right now, would it be a proper approximation of trusting only DE genes with extreme fold change i .e FC >2 or FC>4 ?
1) You can set logratioTrim as an argument in calcNormFactors.
2) The problem is that you don't know how wrong your normalization is. Consider an example where most genes decrease in abundance by 10-fold in your treated cells. After normalization, the majority of genes would appear to be non-DE, and you would instead observe 10-fold "upregulation" for genes that did not change in abundance. So it's hard to say whether an extreme log-fold change is likely to be correct when the normalization cannot be trusted.
Aaron, the logratioTrim is a parameter of edgeR I can change ?
The option of adding spike-ins is not duable right now, would it be a proper approximation of trusting only DE genes with extreme fold change i .e FC >2 or FC>4 ?
1) You can set
logratioTrim
as an argument incalcNormFactors
.2) The problem is that you don't know how wrong your normalization is. Consider an example where most genes decrease in abundance by 10-fold in your treated cells. After normalization, the majority of genes would appear to be non-DE, and you would instead observe 10-fold "upregulation" for genes that did not change in abundance. So it's hard to say whether an extreme log-fold change is likely to be correct when the normalization cannot be trusted.