Limma model design: is it better to just build a separate design for each specific comparison?
1
0
Entering edit mode
Ahdee ▴ 50
@ahdee-8938
Last seen 11 days ago
United States

Hi suppose I have these 3 groups, a b and c.

data= c ( rep("a", 3), rep("b",3),  rep("c",3) )
mm = model.matrix(~0  + data )  
colnames( mm ) = c("a","b","c")

makeContrasts( avsb = a - b, levels=mm)

  a b c
1 1 0 0
2 1 0 0
3 1 0 0
4 0 1 0
5 0 1 0
6 0 1 0
7 0 0 1
8 0 0 1
9 0 0 1

so what I want to do is compare a vs b and a vs c separately. In the above I created a contrast a -b however I'm not sure if it is correct to do this since column a and column for example has extra row of data 7-9. The contrast seem to suggest its ok to do this however I did a comparison with this.

data2= c ( rep("a", 3), rep("b",3) )
mm2 = model.matrix(~0  + data2 )  
colnames( mm2 ) = c("a","b")

makeContrasts( avsb = a - b, levels=mm2)

when I run these two design through limma its giving me slightly different pvalues and importantly logfc suggesting that the first model is not completely eliminating the rows 7-9 the c column? I'm super curious about this and wondering if anyone has a comment for this! thanks.

limma • 854 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

The answer is 'it depends'. In general, if the within-group variability for the 'c' samples is comparable to the 'a' and 'b' samples, then you gain by keeping those samples in your model. That's because the variance estimate used for the contrast is based on (sort of) the average within-group variability of all the groups. And since the sample variance isn't a particularly efficient statistic, having more data to estimate the variance is better.

But if the 'c' group has a much higher or lower variability than 'a' and 'b' (not that you could really tell from just three samples anyway), then it might tend to bias your variance estimate somewhat. I tend to keep all data in the model myself, particularly when I have a really underpowered study like this.

Unless you are using array weights, the logFC shouldn't be changing. Are you?

ADD COMMENT
1
Entering edit mode

My advice is more categorical than James'. Since you want to compare a vs b and a vs c, you are assuming that all the samples are comparable. Given this, I can't think of any circumstances where it would not be best to analyse all the sample together.

Subsetting the samples just removes information for no reason.

If you really did think that one of the groups was more variable than the others, then you could model that using voomWithQualityWeights but you would still analyse all the samples together. Subsetting would not solve that problem.

ADD REPLY
0
Entering edit mode

Thanks James, I should be more specific. This is for an RNA-seq that has been TMM normalized and voom. I'm using voom as the input. So in general what you are saying is that leaving everything as a single model and defining the contrast is "better" but only if the variance between the samples are not too off?

ADD REPLY
1
Entering edit mode

When you use voom you are using the variance of the data twice - first when you compute the observation-level weights, and then as part of your statistic. In that situation there may be more reason to keep all samples in your model, in order to get the best weight estimates possible.

It is often difficult if not impossible to say if the variance of one set is 'off' as compared to another, so unless I see something crazy on an MDS plot I tend to just use everything I have.

ADD REPLY
0
Entering edit mode

Thanks James and Gordon! This is very reassuring - I been using single models ( that is analyzing everything together and using contrast to parse out the different comparison ) however started to freak out mostly because the logfc were different. I though that logFC was simply calculated from the just the counts between each group and not take into account the variance between groups? If not then why is logFC different when combining or analyzing separately?

ADD REPLY
0
Entering edit mode

The logFCs do not change if you subset to two groups when running 'lmFit.

However the logFCs will change if you subset to just two groups right back to running voom. This is because the voom weights will change and the voom weights are used in the logFC calculation. When we compute logFCs we give more weight to log-counts that are more precise and less weight to log-counts that are less precise. That is, after all, the whole purpose of computing the voom precision weights.

ADD REPLY

Login before adding your answer.

Traffic: 905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6