Hello,
I'm sorry if this is a recurrent question but the more I read the more I get confused!
I have RNA-seq data from patients for which I have cells either "control" (CT) or "stimulated" (S). So for each patient, I have extracted a cell type and I'm comparing 2 treatments. If I want to analyze for deferentially expressed genes between CT and S, accounting for the patient effect, what design should I use ?
I have seen different ways of doing this in Limma, EdgeR and DESeq2 manuals. My opinion was that I should use:
design=model.matrix(~patient+treatment)
but I have been told that another way to go would be:
design=model.matrix(~0+treatment+patient)
which should (I thought) be the same as above, except here I have to use the function makeContrasts
as an additional step. But the 2 models give me different results!
Another issue I have is that I have a huge variability in the read/sample for this data set, i.e. ranging from 6M reads to 30M. Even after normalization for library size and using the function voomWithQualityWeights
(in Limma) I still get a library size effect, since I clearly see that my samples cluster together (in a heatmap for example) depending on their amount of total reads. Plus, if I look at the FPKM values of house keeping genes (using this ref. : http://www.sciencedirect.com/science/article/pii/S0168952513000899) I can see differential expression across samples, correlating with library size, even after filtering and normalization. Any idea if there is a way to correct for this?
Many thanks for your help!
Kind regards,
Christophe
If you want package maintainers to respond to this, you'll need to put the package names as tags.
Thanks Aaron, I'm going to tag some but the thing is (I think) the theory behind the design formula should be independent of the package (same principle whether you use limma or DESeq2 for example).