Hi suppose I have these 3 groups, a b and c.
data= c ( rep("a", 3), rep("b",3), rep("c",3) )
mm = model.matrix(~0 + data )
colnames( mm ) = c("a","b","c")
makeContrasts( avsb = a - b, levels=mm)
a b c
1 1 0 0
2 1 0 0
3 1 0 0
4 0 1 0
5 0 1 0
6 0 1 0
7 0 0 1
8 0 0 1
9 0 0 1
so what I want to do is compare a vs b and a vs c separately. In the above I created a contrast a -b however I'm not sure if it is correct to do this since column a and column for example has extra row of data 7-9. The contrast seem to suggest its ok to do this however I did a comparison with this.
data2= c ( rep("a", 3), rep("b",3) )
mm2 = model.matrix(~0 + data2 )
colnames( mm2 ) = c("a","b")
makeContrasts( avsb = a - b, levels=mm2)
when I run these two design through limma its giving me slightly different pvalues and importantly logfc suggesting that the first model is not completely eliminating the rows 7-9 the c column? I'm super curious about this and wondering if anyone has a comment for this! thanks.
My advice is more categorical than James'. Since you want to compare a vs b and a vs c, you are assuming that all the samples are comparable. Given this, I can't think of any circumstances where it would not be best to analyse all the sample together.
Subsetting the samples just removes information for no reason.
If you really did think that one of the groups was more variable than the others, then you could model that using
voomWithQualityWeights
but you would still analyse all the samples together. Subsetting would not solve that problem.Thanks James, I should be more specific. This is for an RNA-seq that has been TMM normalized and voom. I'm using voom as the input. So in general what you are saying is that leaving everything as a single model and defining the contrast is "better" but only if the variance between the samples are not too off?
When you use
voom
you are using the variance of the data twice - first when you compute the observation-level weights, and then as part of your statistic. In that situation there may be more reason to keep all samples in your model, in order to get the best weight estimates possible.It is often difficult if not impossible to say if the variance of one set is 'off' as compared to another, so unless I see something crazy on an MDS plot I tend to just use everything I have.
Thanks James and Gordon! This is very reassuring - I been using single models ( that is analyzing everything together and using contrast to parse out the different comparison ) however started to freak out mostly because the logfc were different. I though that logFC was simply calculated from the just the counts between each group and not take into account the variance between groups? If not then why is logFC different when combining or analyzing separately?
The logFCs do not change if you subset to two groups when running
'lmFit
.However the logFCs will change if you subset to just two groups right back to running
voom
. This is because the voom weights will change and the voom weights are used in the logFC calculation. When we compute logFCs we give more weight to log-counts that are more precise and less weight to log-counts that are less precise. That is, after all, the whole purpose of computing the voom precision weights.