I am a bit confused about the rlog() and vsd() functions in DESeq2.
Using log2 transformation, it is quite clear how to proceed for normalisation and then log2 transformation of data. It is however not possible to normalise the data before applying rlog or vst on them. I see from the document that normalisation is done sort of behind the scene in these two cases.
What I see is that the shape of the distribution looks quite different using log2 vs. rlog vs. vsd. In the case of the graph above, it seems if sufficient filtering of raw counts are done, log2 transformation works quite good. I was wondering what is your idea about that and also why rlog values start from below zero?
I am not sure why one would want to do a blind log [base 2] transformation on count data. There are specific functions in DESeq2 for logging data and utilising it in downstream applications.
The general procedure can be regarded as:
Derive raw or estimated counts (outside DESeq2)
Import counts to DESeq2 (tximport and/or DESeq2)
Normalise the counts via an estimation of size factors and gene-wide
dispersion (DESeq2)
conduct differential expression analysis on the normalised counts (DESeq2)
transform the data for downstream applications (e.g., PCA, clustering, 'machine
learning', etc.) via variance
stabilisation or regularised log (DESeq2)
If, at any point, you wish to obtain 'normalised counts', then you can use:
Thanks a lot for your reply. I want to understand how different transformations work and how they differ from one another. I also like to see the effect of transformation on the normalised data. What I see in the plot for example is vst seems to be a bad choice for my data. The rlog gives a very strange left tail to the distribution and oddly enough, log2 transformation seems to do a better job if low counts are filtered enough. This was quite interesting for me. Yes, of course, for the differential expression, I just give the raw read count file to the DESeq function and that does the job. Thanks again.
Thanks a lot for your reply. I want to understand how different transformations work and how they differ from one another. I also like to see the effect of transformation on the normalised data. What I see in the plot for example is vst seems to be a bad choice for my data. The rlog gives a very strange left tail to the distribution and oddly enough, log2 transformation seems to do a better job if low counts are filtered enough. This was quite interesting for me. Yes, of course, for the differential expression, I just give the raw read count file to the DESeq function and that does the job. Thanks again.