I am trying to understand the vst
/rlog
transformation of DESeq2 and...in the following vignette - section 4.2. where vst
and rlog
is explained, it has this paragraph:
Both vst and rlog return a DESeqTransform object which is based on the SummarizedExperiment class. The transformed values are no longer counts, and are stored in the assay slot.
What does it mean that they are no longer counts? It may be mean that the transformed values are not going to be in the "counts" slot as you would find it doing this: counts(dds, normalized=TRUE)
or is it something else?
It is clear that the magnitude that you get after vst
/rlog
and counts(dds, normalized=TRUE)
is not the same... but it is because that vst
/rlog
outputs in a log2 scale, isn't? (of course, there is a variance-stabilized transformation, but the results are in a log2 scale...?) So... this output will be log2 normalized and transformed counts...?
**The reason of this question is because I am wondering if I should save those transformed counts as "normalized_transformed" counts for the future. I used to save the counts(dds, normalized=TRUE)
and those were the ones that I was using for downstream analyses... but now that I have discovered (and read more about) vst/rlog transformation, I will have to change the way of working and doing my analyses. But I am quite worried about the paragraph above, that they are no longer counts and I don't know if I understand everything properly.
Thanks in advance
Regards
Many thanks for you detailed answer, I really appreciate it.
Could you give me an example (or more if possible) of analysis using count-based method, please? Because I don't know if I know any. I want to have it clear when I can use this type of data and when I cannot (cause the sources that I found do not explain more than "it is okay for downstream analyses and/or plots" and it is quite frustrating).
Re this sentence
I don't really understand it cause I found in this paper that vst was the one that has problems with size factors. Did I understand wrong or have I mixed different concepts?
Thanks in advance.
The
DESeq2
package is meant to analyze count data! Most packages that are intended to be used for RNA-Seq data are going to want the raw counts, as they are fitting generalized linear models with a negative binomial link function. That's whatDESeq2
does, as well asedgeR
, which are the two main packages in the Bioconductor world for analyzing RNA-Seq.I believe you are misinterpreting the first quote in your post (the one about
rlog
). What that sentence means is thatrlog
is better if you have large differences in sequencing depth (e.g., size factors vary widely) because it takes sequencing depth into account, whereasvst
doesn't account for sequencing depth and can therefore have problems if the sequencing depth of an experiment varies widely.I understood exactly what you said, but I got confused with the documentation and the info that gives about rlog because it seems to be that they are contradictory (rlog is sensitive but also robust according to this documentation or this one respectively.
Anyway, many thanks again for the answer and your help! :)