Variance stabilization transformation (VST), blind=TRUE
1
0
Entering edit mode
@49806f54
Last seen 21 months ago
United States

I have early embryonic development time series (normal, mutant & treated) RNA-seq counts data from multiple studies which I am planning to use for clustering genes. I have to remove the study/batch effects for which I am using Combat-seq using study ID as batches. Then for normalization and transformation, I am using VST with blind=TRUE option. I see that mean expression of genes is no longer correlated with its variance - which is good. The thing with early embryonic development transcriptome data is that a lot of genes change in their expression levels. Given this huge changes in expression in this kind of data, I am worried about using VST with blind=TRUE option. I am kind of having a feeling that the gene dispersions are being overestimated.

Simply, I looked at the number of genes which are down-regulated from early to late time point. I got around 1600 genes having a log fold change <= -1. On the other hand, if I perform log2(CPM+0.5) normalization, the number of genes down-regulated is around 4000 or so. (log fold change <= -1). I understand that VST penalizes the low expressed genes more to reduce the noise in general. But, I am not so sure whether what I see is a huge reduction in number of genes down-regulated and I am killing lot of genes just because they are highly variable in the general embryonic development time course. Do you people think it is okay? How should I determine whether blind=TRUE is an okay option? Or should I try to do VST with blind=FALSE option? - The few information I have about these samples are Study, time point of development, Treatment. The issue is that I might have only one replicate sample for a treatment. I am not sure how to use them as covariates for the analysis. I will be happy to hear any suggestion or feedback.

Just to mention, I think my results (clustering) are better in general when I perform VST normalization than some of the other things I have tried. But I wanted to sure whether I am doing something really wrong with VST and killing some biological variance in the data.

DESeq2 Transcriptomics RNASeq • 2.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 hours ago
United States

I recommend blind=FALSE, it's not passing much information about the design to the transformation. It only looks at the design to know the _global_ distribution of dispersion values. It doesn't use the design in the transformation itself.

ADD COMMENT

Login before adding your answer.

Traffic: 470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6