Hi, I am performing differential expression analysis of RNA-seq data using GLMs in edgeR.
I have an experimental design with 4 different groups of subjects in a 2k-factorial design (6 different subjects per group). In other words, a control group, a group with condition A, a group with condition B, and a group with both condition A and B. Each subject is also treated with a compound and we have samples at baseline and 3 time-points after treatment.
I am using a model like this: ~condA+condB+condA:condB+treatment
Now to my question: In e.g. condition A there will be samples from different subjects, but also samples from the same subject (4 samples due to the treatment). How should this be properly handled when e.g. identifying differential expression for condition A vs control (or vs condition B)?
I cannot include factors for each subject. I have been suggested to use mixed effects models and use random effects to handle subjects. I am not sure how to implement this for RNA-seq data though.
I have run a separate analysis on only the baseline samples (before treatment) and removing the treatment factor from the model. The factor p-values correlate well with the ones I get from an analysis of all samples including treatment according to the model above, but very few (or no) genes can be called significant. Is the reason that I get significant genes using the complete data due to increased sample size, or is it influenced by the fact that I have replicates within subjects (not really replicates, since they are treated) that the model does not handle?
Any thoughts on how to properly analyze this data, or arguments for that it is ok to use my current strategy, would be very welcome!
Thanks
Leif Väremo
Can one of the limma authors comment on whether anything special needs to be done when using
voom
withduplicateCorrelation
?voom
obviously needs to be run first, but after runningduplicateCorrelation
, should one runvoom
a second time and pass it the correlation and blocking structure fromduplicateCorrelation
?Thanks for your fast suggestion! I had totally missed this section in the limma guide.
I have run
voom
only once, since the attempt to run a second pass after theduplicateCorrelation
returned an error:Running
lmFit
with the correlation and blocking structure after this worked fine regardless.You wouldn't run the second
voom
on the result of the firstvoom
. You'd run it on the original counts a second time, but with passing the correlation and block arguments. Again, though, I'm not sure whether this is necessary.Ah, yes of course! For reference, I tried both alternatives (single and double voom) and the results (adjusted p-values of the coefficients) are very consistent (Pearson correlation >0.99).