Question

Differing topTable() results with Limma with and without specifying coefficient

0

Entering edit mode

krr • 0

@krr-9753

Last seen 9.0 years ago

Hi all,

I'm having trouble interpreting the output of topTable() for a gene level copy number analysis among four different groups of cell lines. The problem is as follows:

After creating my design matrix and contrast matrix using what I think is group means parameterization, I use eBayes to create the model.

##Differential CNA Analysis (Pairwise Comparison)
design_CNA <- model.matrix(~0+factor(cell_line_cluster_4))
colnames(design_CNA) <- c("A","B","C","D")   
fit_CNA <- lmFit(copy_number_data,design_CNA)
contrast_matrix_CNA <- makeContrasts("B-A","C-A","C-B","D-A", "D-B", "D-C", levels = design_CNA)
fit_CNA <- contrasts.fit(fit_CNA, contrast_matrix_CNA)
fit_CNA <- eBayes(fit_CNA)

However, when I use topTable() to determine which cell line group comparisons have significantly different gene copy numbers, I get no significant calls with the following line of code (not specifying a coefficient):

DECNA <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH")

Yet, when I use topTable() with each comparison called specifically by its coefficient, some of these comparisons call significant genes.

DECNA_1 <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH", coef = 1)
DECNA_2 <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH", coef = 2)
DECNA_3 <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH", coef = 3)
DECNA_4 <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH", coef = 4)
DECNA_5 <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH", coef = 5)
DECNA_6 <- topTable(fit_CNA, number = Inf, p.value = adjPvalueCutoff, adjust = "BH", coef = 6)

I think the discrepancy may be related to the significance testing adjustment, in that calling topTable() on the entire model without specifying a comparison has to correct for many more hypotheses than for a specific comparison (i.e. B vs A). However, I'm not 100% confident that's correct and I'm concerned there's a programming and/or conceptual error here. But even if I'm right, which method is "better" for determining genes with significantly different copy numbers in each group?

Thanks in advance for any assistance or insight!

Best Regards,
Kevin

limma microarray statistics • 3.8k views

ADD COMMENT • link updated 9.0 years ago by Gordon Smyth 52k • written 9.0 years ago by krr • 0

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 8 hours ago

WEHI, Melbourne, Australia

Despite the question and answers so far, no one has suggested reading the help page for topTable. Sigh.

ADD COMMENT • link 9.0 years ago Gordon Smyth 52k

score 2 · Accepted Answer · 2016-04-21

2

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 9 days ago

United States

I think you're misdiagnosing the reason why call topTable without a value for coef isn't giving you any "differentially expressed" results.

If you don't specify a value for coef, the topTable function will do it for you (you can see what happens when you look at the source code for the topTable function).

If I were you, I'd stick with the results of the individual coef calls, since each one is testing a very specific question (presumably ones that you want answered).

ADD COMMENT • link 9.0 years ago Steve Lianoglou ★ 13k

4

Entering edit mode

To elaborate further on Steve's answer, if you don't specify coef, then all the (non-intercept) coefficients will be dropped simultaneously by default in topTable, thus performing an ANOVA-like test. The null hypothesis for such a test is that all the individual nulls are true, i.e., A=B=C=D in your cell-means design. This is inherently different to the specific pairwise comparisons - which, by definition, only test between two groups - which is why you get different results.

In your case, the ANOVA-like test needs to account for the extra variability introduced by considering many groups, which is why it has less power to detect DE between two specific groups (compared to a direct pairwise comparison). However, that's not to say that pairwise comparisons are more powerful. If there were small differences across all groups, the ANOVA-like test would be able to combine information from many groups to reject the null, whereas a pairwise comparison between any two groups would not.

As a general rule, I tend to observe more rejections in my DE analyses when I use an ANOVA-like test, because it can effectively combine evidence for DE across multiple groups. But I wouldn't be surprised or concerned at seeing fewer rejections, for the reasons stated above; especially when you have low numbers of DE in your experiment.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k