Hi, using RNAseq I am interested in the RNA expression signature of two patient groups. Initial analyses with LIMMA indicate that there is a source of variation which contributes more to the variance within the sample groups than between them. In a PCA the samples of the patient groups are separated along PC2, while a covariate (or covariates) contribute(s) to PC1. So far we have not been able to identify the source of variation in PC1. What can I do to model the unknown covariate and just get to the variance contributing to the difference of the patient groups? I guess in general the question comes down to: How to model sources of unknown variation? Thank you so much. Andreas
PS I have some though not very strong background in statistics and R
Hi Scheran, perhaps you can make a biplot to see the genes that drive the variance in your PCA analysis. See this link for a description of the biplot. I hope this helps. And one more thing, perhaps there is also a component of technical (e.g. batch effect) instead of biological variation in your data?