DESeq2:varianceStabilizingTransformation suggested. Results suspicious
1
0
Entering edit mode
jovel_juan ▴ 30
@jovel_juan-7129
Last seen 6 months ago
Canada

I run some analysis with DESeq2 and got the following warning:

NOTES: - Data was quantified with Kallisto. - I did not aggregate the data at the gene level. I run it at the transcript level, and was using a cut-off of qval < 0.01, instead of 0.05.

In this data, for 27.2% of genes with a sum of normalized counts above 100, it was the case that a single sample's normalized count made up more than 90% of the sum over all samples. the threshold for this warning is 10% of genes. See plotSparsity(dds) for a visualization of this. We recommend instead using the varianceStabilizingTransformation or shifted log (see vignette).

What does it mean? Should I run the differential expression test indicating the varianceStabilizingTransformation instead of the default DESeq transformation? If yes, how to do that?

I did not do anything and run the test with the wrapper as always: ddsMat <- DESeq (ddsMat)

I got thousands of deregulated transcripts in each comparison. Because of the warning described above, I also run the same comparison using SLEUTH and only got a couple of dozens in each case.

What should I do?

deseq2 • 1.2k views
ADD COMMENT
0
Entering edit mode

Thanks for your answer Mike!

Those are cultured heart cells with a specific phenotype subjected to two different drugs or to the combination of both drugs. Each treatment was compared to non-treated cells.

Please see attached PCA of one of the comparison throwing the referred warning. It is clear that sample bs13 is different from the rest.

Should I remove that sample and repeat the analyses?

enter image description here

ADD REPLY
0
Entering edit mode

I move these to comments, rather than "answers".

Can you tell me the sample sizes? Is this bulk RNA-seq?

ADD REPLY
0
Entering edit mode

Ok, the images came through in my email, but not here.

I think you can ignore the warning. I'm guessing you have 1 out of 2 samples with much higher counts overall, and this is why 90% of the row sum count is coming from a single sample for a large fraction of genes.

ADD REPLY
0
Entering edit mode

Sorry, figure did not load in the previous message. Here a new attempt.

enter image description here

ADD REPLY
0
Entering edit mode

This is a typical example, but all other comparisons have something similar. What to do?

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

What kind of samples do you have? How many? That warning is flagged for extremely sparse data. I wrote it to be self explanatory, that you have a lot of genes where most of the row count comes from a single sample.

ADD COMMENT

Login before adding your answer.

Traffic: 924 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6