I run some analysis with DESeq2 and got the following warning:
NOTES: - Data was quantified with Kallisto. - I did not aggregate the data at the gene level. I run it at the transcript level, and was using a cut-off of qval < 0.01, instead of 0.05.
In this data, for 27.2% of genes with a sum of normalized counts above 100, it was the case that a single sample's normalized count made up more than 90% of the sum over all samples. the threshold for this warning is 10% of genes. See plotSparsity(dds) for a visualization of this. We recommend instead using the varianceStabilizingTransformation or shifted log (see vignette).
What does it mean? Should I run the differential expression test indicating the varianceStabilizingTransformation instead of the default DESeq transformation? If yes, how to do that?
I did not do anything and run the test with the wrapper as always: ddsMat <- DESeq (ddsMat)
I got thousands of deregulated transcripts in each comparison. Because of the warning described above, I also run the same comparison using SLEUTH and only got a couple of dozens in each case.
What should I do?
Thanks for your answer Mike!
Those are cultured heart cells with a specific phenotype subjected to two different drugs or to the combination of both drugs. Each treatment was compared to non-treated cells.
Please see attached PCA of one of the comparison throwing the referred warning. It is clear that sample bs13 is different from the rest.
Should I remove that sample and repeat the analyses?
I move these to comments, rather than "answers".
Can you tell me the sample sizes? Is this bulk RNA-seq?
Ok, the images came through in my email, but not here.
I think you can ignore the warning. I'm guessing you have 1 out of 2 samples with much higher counts overall, and this is why 90% of the row sum count is coming from a single sample for a large fraction of genes.
Sorry, figure did not load in the previous message. Here a new attempt.
This is a typical example, but all other comparisons have something similar. What to do?