I have a question regarding the rlog normalization.
I have many samples to compare, with only one factor. A vs Treat, B vs Treat, C vs Treat ... So, should I put everything in the same DESeqDataSet object even though the variability between groups is very large (I did the differential expression analysis with the comparisons separated), and then calculate the rlog of all samples?, Or put the different comparisons in different DESeqDataSet objects and extract the rlog of each comparison, and later join the rlogs by EnsemblID?
Thank you!
Yes, I read that in the vignette, thank you Professor Love!
However, I still have the doubt of whether to group everything in the same DESeqDataSet or to separate it into different ones. I suppose that to perform data visualization, it is better to put everything together, and for differential analysis to do everything separately, right?
Also, I am seeing if I transform them with blind = F, since between groups I expect great genetic variability (not within the groups themselves). Although if I want to do an unsupervised hierarchical clustering with a z-score of very different tumor samples to cluster them transcriptomically, would you apply blind = T?
Thank you so much!
The question about "whether to group everything in the same DESeqDataSet or to separate it into different ones" is a FAQ in the vignette.
I recommend
blind=FALSE
generally. The design is not used in performing the transformation, which is fixed for all samples equally. It is only used to understand the global amount of within-group variability. It will still be unsupervised withblind=FALSE
.