Hi everyone, and prof Gordon Smyth

Pls help on how best to view two designs used for limma as below. The objective was to pool higher/secondary-level groups as well as first-level groups of samples within the design to get DGE.

So, with a design and the logCPM mean-variance output i.e. voom() function , Four people used the logic of normal designs and therefore added the 'higher/secondary-level' contrasts as below

ct<-makeContrasts(g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive),
g2v1dead=group2_dead - group1_dead , g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead) - group1dead+group1alive, levels=design)

b<-eBayes( lmFit(data, design),  contrasts=ct))

My question is : Does this approach have any form of interpretation from the resulting DE or it should be discarded completely in favour of division by numbers as below

ct <- makeContrasts(g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2  ,
g2v1dead=group2_dead - group1_dead ,    g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead)/2 - (group1dead+group1alive)/2 ,  levels=design)

b<-eBayes( lmFit(data, design),  contrasts=ct))

When you fit a linear model and make comparisons you are always computing the average for a group, and you make comparisons by calculating differences between those averages. In your first contrast you are computing sums, whereas the second you are computing averages. In other words, in

g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive)

That is the sum of group 2 minus the sum of group 1, which isn't something you would normally care to know.

g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2

is the average of group 2 minus the average of group 1, which is a readily interpretable quantity.

Very many thanks for the reply! @ James MacDonald

Indeed it is probably unnecessary to do g2v1=(group2_dead+group2_alive) hence the question about interpretation vis-a-vis the concept of DE. Part of why I asked about interpretability is because there was a 'non-expert' querying me about the input are all sum of log data

I guess you are indicating that such is not interpretable

The two different contrast matrices you give will yield identical lists of DE genes, p-values and FDRs. The only difference will be in the log-fold-changes, which will differ by a factor of 2 for the third contrast. As long as you know what the logFCs mean, both choices lead to the same conclusions, but I would always myself use the mean-mean contrast instead of sum-sum.


