Limma: correct calculation of B statistics (log odds)

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 9.6 years ago

United Kingdom

I have been using B values to rank genes in order of more likely to less likely (differentially expressed) in LimmaGUI. I am now using Limma, I noticed the default value for the parameter "proportion" (on the function eBayes) is set at 0.01 (expected 1% differentially expressed genes). I didn't pay much attention to this parameter before, because in LimmaGUI you cannot specify it. However, now that I use "straight" Limma more I was playing with the proportion parameter and it affects the B stats a lot. Therefore I come to the question of what's the best way to estimate this parameter. My first guess is to use the P values (FDR, calculated by BH) to decide a cut off, usually 0.05. Then see how many genes are differentially expressed according to that rule. And use this observed proportion of differentially expressed genes as my proportion parameter. Is this the correct way to do it? Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

limma limmaGUI limma limmaGUI • 2.3k views

ADD COMMENT • link updated 19.0 years ago by Wittner, Ben ▴ 290 • written 19.0 years ago by J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Wittner, Ben ▴ 290

@wittner-ben-1031

Last seen 9.3 years ago

USA/Boston/Mass General Hospital

Jose, I'm very glad you asked this question. One of the things that has made me wary of using limma is that the proportion of differentially expressed genes is often one of the primary things I'm trying to discover from the data, so I feel uneasy making an assumption as to what that proportion is. In your email below, you say that the output of limma is sensitive to the assumption, which, of course, makes me feel even more uneasy about it. I've not noticed any responses on the BioC list. Has anyone commented on this issue to you? -Ben > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of J.delasHeras at ed.ac.uk > Sent: Wednesday, April 19, 2006 8:06 AM > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Limma: correct calculation of B statistics (log odds) > > > I have been using B values to rank genes in order of more likely to > less likely (differentially expressed) in LimmaGUI. > > I am now using Limma, I noticed the default value for the parameter > "proportion" (on the function eBayes) is set at 0.01 (expected 1% > differentially expressed genes). I didn't pay much attention to this > parameter before, because in LimmaGUI you cannot specify it. > > However, now that I use "straight" Limma more I was playing with the > proportion parameter and it affects the B stats a lot. Therefore I come > to the question of what's the best way to estimate this parameter. > > My first guess is to use the P values (FDR, calculated by BH) to decide > a cut off, usually 0.05. Then see how many genes are differentially > expressed according to that rule. And use this observed proportion of > differentially expressed genes as my proportion parameter. > > Is this the correct way to do it? > > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 19.0 years ago Wittner, Ben ▴ 290

0

Entering edit mode

Hi Ben, the only thing that changes is the B value. Everything else (I think!) stays unaffected. If you don't want to use the B value, then I think you can ignore that parameter (proportion) because I haven't noticed any differences in the P values obtained, either adjusted or non-adjusted for multiple testing. Jose Quoting "Wittner, Ben, Ph.D." <wittner.ben at="" mgh.harvard.edu="">: > Jose, > > I'm very glad you asked this question. One of the things that has > made me wary > of using limma is that the proportion of differentially expressed > genes is often > one of the primary things I'm trying to discover from the data, so I > feel uneasy > making an assumption as to what that proportion is. In your email > below, you say > that the output of limma is sensitive to the assumption, which, of > course, makes > me feel even more uneasy about it. > > I've not noticed any responses on the BioC list. Has anyone commented on this > issue to you? > > -Ben > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- >> bounces at stat.math.ethz.ch] On Behalf Of J.delasHeras at ed.ac.uk >> Sent: Wednesday, April 19, 2006 8:06 AM >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] Limma: correct calculation of B statistics (log odds) >> >> >> I have been using B values to rank genes in order of more likely to >> less likely (differentially expressed) in LimmaGUI. >> >> I am now using Limma, I noticed the default value for the parameter >> "proportion" (on the function eBayes) is set at 0.01 (expected 1% >> differentially expressed genes). I didn't pay much attention to this >> parameter before, because in LimmaGUI you cannot specify it. >> >> However, now that I use "straight" Limma more I was playing with the >> proportion parameter and it affects the B stats a lot. Therefore I come >> to the question of what's the best way to estimate this parameter. >> >> My first guess is to use the P values (FDR, calculated by BH) to decide >> a cut off, usually 0.05. Then see how many genes are differentially >> expressed according to that rule. And use this observed proportion of >> differentially expressed genes as my proportion parameter. >> >> Is this the correct way to do it? >> >> Jose >> >> -- >> Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk >> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 >> Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 >> Swann Building, Mayfield Road >> University of Edinburgh >> Edinburgh EH9 3JR >> UK >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD REPLY • link 19.0 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Wittner, Ben ▴ 290

@wittner-ben-1031

Last seen 9.3 years ago

USA/Boston/Mass General Hospital

Dear Gordon, I apologize for not thanking you more quickly for your detailed and thoughtful response. I think I agree with everything you've said below, but now I have another concern on which I would like your opinion. For many of the data sets I've dealt with, for many genes, the variances of the two classes do not seem to be equal. For example, the code below uses R's var.test() to produce a p-value for each gene and then plots a histogram of the p-values. The histogram can be viewed at http://tinyurl.com/epdn7 The model implemented in limma seems to assume a single variance for each gene. Do you think this is a problem? Thanks again, -Ben library('ALL') data(ALL) pdat <- pData(ALL) subset <- intersect(grep('^B', as.character(pdat$BT)), which(pdat$mol %in% c('BCR/ABL', 'NEG'))) eset <- ALL[, subset] i1 <- which(eset$mol == 'BCR/ABL') i2 <- which(eset$mol == 'NEG') pvals <- apply(exprs(eset), 1, function(v) (var.test(v[i1], v[i2])$p.value)) jpeg(filename='ALL.jpeg', width=240, height=240) hist(pvals, col='green', main='Histogram of var.test() pvals for ALL BCR/ABL vs NEG') dev.off() > -----Original Message----- > From: Gordon Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Thursday, April 20, 2006 8:02 PM > To: Wittner, Ben, Ph.D. > Cc: bioconductor at stat.math.ethz.ch; J.delasHeras at ed.ac.uk > Subject: [BioC] Limma: correct calculation of B statistics (log odds) > > Dear Ben, > > Please see also my longer reply to Jose in a separate email. > > The t-statistics, p-values and gene rankings provided by limma do not > depend on the assumed proportion. In fact part of the motivation for > developing the moderated t-statistics was to obtain a statistic with > the same power as the posterior odds without needing this > difficult-to-estimate quantity. > > While the B-statistic does depend on the prior assumed proportion, > this is dependence is very straightforward, well understand and > explicit. The prior log-odds simply adds a constant to all the > genewise B-statistics. It doesn't change the ordering. > > I agree with your desire to avoid dependence on unjustified > assumptions. My approach in limma has been to minimise assumptions > where possible but otherwise to make the assumptions very explicit. > > What I personally feel uneasy about are statistical methods which > propose to estimate quantities about which the data contains very > little information. The dependence on assumptions may be hard to see. > It seems to me that the proportion of DE genes is just such a > quantity, because its estimation must be highly sensitive to model > assumptions in small microarray experiments. I could easily provide > an automatic estimate of this quantity as part of the eBayes() > computations in limma, but I deliberately chose not to do this. > > Expanding a little further on this topic, it seems to me that a > biologically meaningful treatment of the proportion of truly DE genes > would require a more careful definition of the concept of > differential expression than has so far appeared in the literature. > It seems to me that mathematicians and biologists have different > things in mind when they think of this quantity. Mathematicians are > including many genes with very small fold changes which the > biologists would do not consider of interest. A biologically > meaningful treatment would have to specify how large a fold change > needs to be in order to be considered material. I suspect that > biologists are going to be surprised by how sensitive the estimated > proportion is to this threshold. > > Best wishes > Gordon > > >[BioC] Limma: correct calculation of B statistics (log odds) > >Wittner, Ben, Ph.D. Wittner.Ben at mgh.harvard.edu > >Thu Apr 20 19:40:10 CEST 2006 > > > >Jose, > > > >I'm very glad you asked this question. One of the things that has made me > wary > >of using limma is that the proportion of differentially expressed > >genes is often > >one of the primary things I'm trying to discover from the data, so I > >feel uneasy > >making an assumption as to what that proportion is. In your email > >below, you say > >that the output of limma is sensitive to the assumption, which, of > >course, makes > >me feel even more uneasy about it. > >I've not noticed any responses on the BioC list. Has anyone commented on > this > >issue to you? > > > >-Ben

ADD COMMENT • link 19.0 years ago Wittner, Ben ▴ 290

Login before adding your answer.