Hi all,
This is more a general (philosophical?) question:
Say I have a dataset analyzed with DeSeq default, and a vst normalized dataset is obtained.
The VST is then used to generate a final predictive ML model (binary) with only few genes from the VST dataset.
Now I have a few samples that I re-rerun end-to-end with the exact same pipeline as above.
After importing in DeSeq what normalization would you think to use in order to obtain a close VST from the original run?
The goal is to predict correctly the new repeated samples.
I know there are millions of variables in play, but was curious to see what the folks would answer.
Thank you in advance
Dear Michael,
could you please comment on how to properly apply a vst transformation to a new dataset (in this case, left-out samples)? I have been breaking my head about this and cannot seem to get it correct. dds_test is currently two samples but this could vary in the future. Ideal use case is just one test sample.
a small example of my current script:
I will further perform PCA on the train data and then project the vst-transformed left-out samples in the same PCA space.
Thank you in advance!