Hi,
I have a big rnaseq data set (146 samples), I did the voom transformation with a design matrix that have my factor with my 4 groups of interest and 4 more factors that could affect the expression as well, but I am not interested in them,only added to catch the variation they may introduce.
I did a PCA of the weights that voom returns, and I saw my samples clustered in the 4 groups I am interested to do the DE, so suddenly I had the question what this means? Is something that we should expect, or that means that are some bias, or is not important?
I tried to think about it, I thought that weight won't be correlated with anything, but some genes are doing the separation of the samples because of the weight values. Since weights are used in the glm, and they are correlated with my groups, don't know if results will be correct.
thanks in advance
on a side note, I am looking to the density plots, and may be the expression distribution can explain this. but still don't know if I need to be more carefully in the DE step, or try to normalize better.
At first blush, I don't think I'd expect weights to correlate with anything, but if you accept that higher expressed genes will also have a higher weight, then the result you observe isn't so surprising, no?