Question

unexpected voom! / limma contrast inconsistency

0

Entering edit mode

Stephen Hoang ▴ 30

@stephen-hoang-6156

Last seen 7.3 years ago

Hi All,

We have come across somewhat unexpected behavior from voom! / limma that is worth sharing. We found that using the same data, but different design matrices, "equivalent" contrasts can yield different results.

We constructed a design matrix, such that to make our desired contrast (treatment - control), we compared two coefficients of the resulting linear model using contrasts.fit(). We also constructed a separate design matrix in such a way that the control was the baseline of the linear model. In principle, the statistics from the contrast.fit() comparison in the first design should be equivalent to the statistics for the treatment coefficient in the second design. However, we found that this is not the case. Fold changes and average expression values for each gene do not differ between
the two designs; however, the t-statistics, and P-values do.

The reason for this difference is noted in the contrasts.fit() help page in the limma vignette, but this caveat is sufficiently subtle that it may fly under many radars (as it did with ours). From the help page: "...if the design matrix is non-orthogonal and the original fit included quality weights or missing values, then the unscaled standard deviations produced by this function are approximate rather than exact." Since voom() always generates quality weights, the unscaled standard deviations generated by contrasts.fit() will always be approximate for non-orthogonal design matrices (the vast majority of design matrices). Consistently, the P-values are, on balance, worse for contrasts made with contrasts.fit() in our
case study. As a sanity check, we forced all of the quality weights to 1, which indeed caused the two comparisons to generate the same statistics.

Does anyone have any suggestions on how to avoid this problem for comparisons that require the use of contrasts.fit()?

limma • 2.4k views

ADD COMMENT • link updated 9.4 years ago by Gordon Smyth 52k • written 11.6 years ago by Stephen Hoang ▴ 30

Gordon Smyth · Answer 1 · 2013-09-22

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Dear Stephen,

Just for clarification, the class of orthogonal design matrices includes matrices of the form model.matrix(~0+a). contrasts.fit() gives exact results for these design matrices even with observation weights. So contrasts.fit() do give exact results in conjunction with voom() for some of the most common design matrices.

If you do have a non-orthogonal design matrix then the solution right now is to use 0~a or, as Ryan Thompson points out, to re-parametrize the design matrix so that the contrast becomes a coefficient.

contrasts.fit() was written as it was for speed, ten years ago when computers were much slower than they are now. I have been asked several times to provide an always-exact contrast computation, and voom is the prompt for me to finally do this.

Best wishes
Gordon

ADD COMMENT • link 11.5 years ago • updated 9.4 years ago Gordon Smyth 52k

1

Entering edit mode

Hi,

Sorry for bring back an old thread. I just came across this same problem in analyzing my RNAseq data with voom/limma (I also missed the note in the contrasts.fit help file). I was getting confused when I re-parameterized the design matrix to make the coefficients more interpretable and the F statistics changed, even though I could prove using lm() on single genes that I should be getting the same answers. I don't think that I can make my design matrix orthogonal, and I want to test that a subset of coefficients are simultaneously zero (ex. is there an effect of any of the several genotypes on expression in either of two treatments, after accounting for a few covariates).

I was wondering if anyone had updated the necessary functions to allow exact testing for arbitrary sets of coefficients or contrasts with observation weights. Gordon Smyth mentioned above that such functionality would come with voom, but I haven't found it.

If not, I have hacked the lm.series, lmFit and ebayes functions to take a list of coefficients or a list of contrast matrices to calculate the corresponding contrasts and F statistics during the initial fit. I think that it works - I get the same answers passing a matrix of observation weights that are identical across genes as if I pass only a vector of array-specific weights to the original functions. But I would greatly appreciate it if someone with more in-depth knowledge of these functions could check and make sure that these are done correctly.

Thanks.

Dan Runcie

Assistant Professor

Department of Plant Sciences

University of California, Davis

ADD REPLY • link updated 9.4 years ago by Gordon Smyth 52k • written 9.4 years ago by deruncie ▴ 10

1

Entering edit mode

Your code does look correct to me. I see that you are computing cov.coef separately for each gene, and this does get around the problems.

Making all the limma code work exactly with observation weights has proved a bigger task than I had hoped. Rather than hacking the code, here is another suggestion that is probably better. You could alternatively avoid the problems by using the limma-trend method mentioned in the voom paper. If your library sizes are not too different, say not more than a factor of 2 or 3 between largest and smallest, then you can simply compute log2CPM:

logCPM <- cpm(y, log=TRUE, prior.count=1)

with y your DGEList object or matrix of counts. Then you can run the usual limma pipeline except that you specify trend=TRUE in the call to eBayes(). This approach will be essentially as good as voom, but all the contrasts and F-tests will be exact.

If you want to modify the code for a Bioconductor package, and to distribute the modification, the usual way to proceed is to mail the author directly (me in this case). I appreciate your efforts with the code here. But I should gently point out that it isn't correct to post modified code to a public web site without preserving the original GPL copyright statement including the names of the original authors. The code is open source and free to use, but the GPL license puts some conditions on how you are allowed to redistribute it to others.

Edit: When I first replied, I thought there was an error in the your F-statistic computation, but I now see that it is correct.

ADD REPLY • link 9.4 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,

Thank you for your reply. I apologize for not approaching the modified code correctly and will follow these directions in the future.

With the limma-trend method, is it reasonable to include the sample-specific weights calculated with voomWithQualityWeights? I had been using that function in my pipeline because I have a few samples that look like outliers. Or would it be better to simply drop these samples?

Thanks.

Dan

ADD REPLY • link updated 9.4 years ago by Gordon Smyth 52k • written 9.4 years ago by deruncie ▴ 10

0

Entering edit mode

Just use arrayWeights() instead. You don't need voomWithQualityWeights() with the limma-trend pipeline.

ADD REPLY • link 9.4 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon,

Thank you very much for the helpful clarification.

In the case that a blocking variable is given to lmFit (in my case, to specify duplicate correlation), along with a design of the form model.matrix(~0+a), will contrasts.fit() produce approximate or exact results? Intuitively, it seems that part of the design is captured in the blocking variable, possibly meaning that the overall design is not of the orthogonal class that ensures exact results.

Thanks,
Steve

ADD REPLY • link updated 9.4 years ago by Gordon Smyth 52k • written 11.5 years ago by Stephen Hoang ▴ 30

0

Entering edit mode

Dear Steve,

You are right. contrast.fit() is not exact for contrasts in the context of an additive model with observational weights.

Gordon

ADD REPLY • link 11.5 years ago • updated 9.4 years ago Gordon Smyth 52k

Gordon Smyth · Answer 2 · 2013-09-20

Hi,

Gordon recently answered a request of mine to split out limma's design-matrix-reparametrizing logic into a separate function, which he added under the name "contrastAsCoef":

https://stat.ethz.ch/pipermail/bioc-devel/2013-September/004652.html

This function takes a design matrix and a set of N contrasts and returns an equivalent design matrix where those contrasts are now the first N coefficients in the design matrix.

Maybe you could use this to reparametrize your design matrix prior to lmFit so that you never have to use contrasts.fit.

-Ryan