Hi,
I use RUVSeq and I find it extremely helpful. I have a question concerning
the number of covariables to be used under RUVr. I've realized that
increasing the number of covariables makes the groups I want to see on the
PCA more visible and distinct from each other. It follows the the number of
DE genes also increases with k.
In one of my projects I have 72 samples and I run RUVr with up to k=50. The
number of DE genes on each of my comparisons increases exponentially up to a
plateau when k is high. Likewise, the common dispersion decreases with increasing k. It looks so good both in terms of PCA and DE genes that I wonder if using such high k values might have induced false interpretations or high number of false positives.
I came to ask myself such questions also because on the RUVSeq manual, the
given example is k=1 and I wondered why this is the case if increasing k
improves the results.
I would be grateful if you could provide me with any feedback on this.
Thanks!
Although RUVr and RUVs are more robust to the choice of negative controls, they still formally require you to choose a set of such genes. Since, as I said, RUVr is robust to some negative controls not being really "negative", I would suggest that you try with a "general" set of genes, such as the list of housekeeping genes that you can find here:
http://www.stat.berkeley.edu/~johann/ruv/resources/hk.txt
We have good experience with using housekeeping genes as negative controls, in general.
Thanks Davide. I'll have a look at HKG, though it is not that straignt forward: the species I am dealing with has not been so thouroughly studied. And I wil certainly reduce the number of k variables!
Best,