SAM warning

0

Entering edit mode

elliot harrison ▴ 230

@elliot-harrison-2391

Last seen 10.4 years ago

Hi, I'm trying the MBCB correction for Illumina data and then running sam afterwards. I've run this successfully as few times but one experiment I get the message "Warning message: There are 1 variables with zero variance. These variables are removed, and their d-values are set to NA. " Is this just referring to one genes values having no variance (I guess so as just a warning) or one of the experiment groups? If it is just one gene do I need worry? I guess not as is just fluke chance. Thanks Elliott -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of bioconductor-request at stat.math.ethz.ch Sent: Thursday, January 15, 2009 11:00 AM To: bioconductor at stat.math.ethz.ch Subject: Bioconductor Digest, Vol 71, Issue 11 Send Bioconductor mailing list submissions to bioconductor at stat.math.ethz.ch To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at stat.math.ethz.ch You can reach the person managing the list at bioconductor-owner at stat.math.ethz.ch When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Re: Filtering before differential expression analysis of microarrays - New paper out (Steve Lianoglou) 2. Re: multiple locations for probeset in hgu133plus2CHRLOC vs. UCSC PSL data (Robert Gentleman) 3. Re: Filtering before differential expression analysis of microarrays - New paper out (Steve Lianoglou) 4. Re: Filtering before differential expression analysis of microarrays - New paper out (James W. MacDonald) 5. Re: Filtering before differential expression analysis of microarrays - New paper out (Daniel Brewer) ---------------------------------------------------------------------- Message: 1 Date: Wed, 14 Jan 2009 11:10:38 -0500 From: Steve Lianoglou <mailinglist.honeypot@gmail.com> Subject: Re: [BioC] Filtering before differential expression analysis of microarrays - New paper out To: Gordon Smyth <smyth at="" wehi.edu.au=""> Cc: "James W. MacDonald" <jmacdon at="" med.umich.edu="">, Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <f2b2eeef-480a-4180-be00-fd21909d59bc at="" gmail.com=""> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Hi Gordon, As someone who has been dealing more and more with raw data, I always appreciate detailed answers from the masters, such as the one you just wrote. Even after reading several of the published articles regarding these normalization practices, I always find these less formal emails quite helpful. That said, one point you mention isn't exactly clear to me, and I'm wondering if you could elaborate just a bit here: > Filtering non-expressed probes tends not be emphasised on this list > because users of this list are often sophisticated enough to use > variance stabilizing normalization methods such as rma, vsn, normexp > or vst. This means that low-expression filtering is done more for > multiplicity issues than for variance stabilization, and therefore > often doesn't make a huge difference. When using earlier > normalization methods such as MAS for Affy or local background > correction for two-color arrays, expression-filtering is absolutely > essential, because the normalized expression values are so unstable at > low intensity levels. When you say "... low-expression filtering is done more for multiplicity issues than for variance stabilization", what exactly do you mean by "multiplicity issues"? Thanks, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos ------------------------------ Message: 2 Date: Wed, 14 Jan 2009 09:00:59 -0800 From: "Robert Gentleman" <rgentlem@fhcrc.org> Subject: Re: [BioC] multiple locations for probeset in hgu133plus2CHRLOC vs. UCSC PSL data To: "Marc Carlson" <mcarlson at="" fhcrc.org=""> Cc: "Bazeley, Peter" <peter.bazeley at="" utoledo.edu="">, bioconductor at stat.math.ethz.ch Message-ID: <b796582f0901140900nca4fc66x5f84eb96330a2d04 at="" mail.gmail.com=""> Content-Type: text/plain To follow up slightly On Tue, Nov 18, 2008 at 9:57 AM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Hi Peter, > > I think that your confusion is coming from the fact that these are the > chromosome start locations for the genes and not the probes. > According to Affy, that probe is supposed to be measuring that gene > and we took their word for that. We then gave you the start positions > for transcripts of that gene according to UCSC. We don't currently > provide the data for where the probe aligns to the genome or to which > transcripts in the genome the probe might stick to. You can easily find all genomic regions using Biostrings, and this is one of the examples in the vignette, I believe. Finding all transcripts is harder (at least in the sense that we have not yet developed a pipeline for it). You would need to download all the transcripts sequences from somewhere (RefSeq?), and then basically modify the example in the Biostrings vignette to do the matching. These are not particularly large or hard problems, so a few hours would deal with the first, maybe a day or two for the second. best wishes Robert > > > > Marc > > > > > Bazeley, Peter wrote: > > Hello, > > > > R version: 2.8.0 > > > > I just installed the hgu133plus2.db package, and am looking at the > hgu133plus2CHRLOC environment. I've noticed that some of the probeset > entries (e.g. "201268_at") have multiple locations compared to Affy's > annotation file. I'd like to figure out if these multiple locations > are current, in which case it is some sort of overlapping/repeating duplication. > For example: > > > > > >> as.list(hgu133plus2CHRLOC)$'201268_at' > >> > > 17 17 17 17 > > 46598879 46597889 46598637 46599081 > > > > indicates that the probeset maps to 4 locations. Compare this to the > alignments info in the Affy's annotation file (from 7/8/08, > http://www.affymetrix.com/Auth/analysis/downloads/na26/ivt/HG- U133_Plu > s_2.na26.annot.csv.zip > ): > > > > chr12:119204403-119205041 (+) // 91.49 // q24.31 /// > chr17:46598810-46604103 (+) // 96.87 // q21.33 > > > > which suggests one location on chromosome 17 (I'm ignoring > > chromosome 12 > for now). This is a "_at" probeset, so it should map uniquely to a > sequence, according to Affy's "Data Analysis Fundamentals" document > (and speaking to a rep). > > > > >From the information provided by "?hgu133plus2CHRLOC", I downloaded > > > ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/d > atabase/affyU133Plus2.txt.gz > > from UCSC to see how this occured, but it is not clear. Actually, > > the > file: > > > http://www.affymetrix.com/Auth/analysis/downloads/psl/HG- U133_Plus_2.l > ink.psl.zip > > from Affy's support page has the same alignment info. Here's the > > relevant > PSL info: > > > > Target sequence name: chr17 > > Alignment start position in target: 46598810 Alignment end position > > in target: 46604103 Number of blocks in the alignment (a block > > contains no gaps): 5 Comma-separated list of sizes of each block: > > 47,130,102,113,257, Comma-separated list of starting positions of > > each block in target: > 46598810,46599186,46600601,46602296,46603846, > > > > > > The second location provided by CHRLOC (46597889) occurs before the > > start > of the alignment in the PSL info, so perhaps this one CHRLOC location > corresponds to the PSL alignment? The mappings were obtained from UCSC > on 2006-Apr14, so perhaps additional alignments existed at the time, > which have since been removed. > > > > > > Thank you for any help. Hopefully I'm just missing something obvious > (well, non-obvious for me). > > > > Peter Bazeley > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org [[alternative HTML version deleted]] ------------------------------ Message: 3 Date: Wed, 14 Jan 2009 12:59:53 -0500 From: Steve Lianoglou <mailinglist.honeypot@gmail.com> Subject: Re: [BioC] Filtering before differential expression analysis of microarrays - New paper out To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> Cc: Gordon Smyth <smyth at="" wehi.edu.au="">, Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <052D02CD-EB11-4DB6-AE65-DF00B118943F at gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Thanks, Jim! Multiplicity as in multiple testing makes sense, I wasn't sure if he was referring to something about probes appearing in multiple places or something within arrays, or across arrays, or something (which I was trying to parse into how that might be relevant here). Cheers, -steve On Jan 14, 2009, at 12:50 PM, James W. MacDonald wrote: > Hi Steve, > > The question wasn't really asked of me, but Gordon is likely in bed > right now ;-D > > Steve Lianoglou wrote: >> Hi Gordon, >> As someone who has been dealing more and more with raw data, I always >> appreciate detailed answers from the masters, such as the one you >> just wrote. Even after reading several of the published articles >> regarding these normalization practices, I always find these less >> formal emails quite helpful. >> That said, one point you mention isn't exactly clear to me, and I'm >> wondering if you could elaborate just a bit here: >>> Filtering non-expressed probes tends not be emphasised on this list >>> because users of this list are often sophisticated enough to use >>> variance stabilizing normalization methods such as rma, vsn, normexp >>> or vst. This means that low-expression filtering is done more for >>> multiplicity issues than for variance stabilization, and therefore >>> often doesn't make a huge difference. When using earlier >>> normalization methods such as MAS for Affy or local background >>> correction for two-color arrays, expression-filtering is absolutely >>> essential, because the normalized expression values are so unstable >>> at low intensity levels. >> When you say "... low-expression filtering is done more for >> multiplicity issues than for variance stabilization", what exactly do >> you mean by "multiplicity issues"? > > By multiplicity issues Gordon was referring to the multiple > comparisons problem. A p-value is an estimate of the probability of a > type 1 error, in which we say there is a difference when in fact there > isn't (a false positive). If we reject the null hypothesis at an alpha > level of 0.05, we are in essence taking a 5% chance of being wrong. > > For one test this isn't a problem, but as you make more and more tests > simultaneously, you expect to see more and more false positives (e.g, > if you do 20 tests at an alpha of 0.05, and there are really no > differences for any of the tests, you still expect about one of them > to appear significant even though none are). > > There are lots of ways to adjust for multiple comparisons, but one of > the best things you can do is not make so many comparisons in the > first place, by filtering out data based on one or more criteria. > > Best, > > Jim >> Thanks, >> -steve >> -- >> Steve Lianoglou >> Graduate Student: Physiology, Biophysics and Systems Biology Weill >> Medical College of Cornell University http://cbio.mskcc.org/~lianos >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-5646 > 734-936-8662 -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos ------------------------------ Message: 4 Date: Wed, 14 Jan 2009 12:50:54 -0500 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] Filtering before differential expression analysis of microarrays - New paper out To: Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> Cc: Gordon Smyth <smyth at="" wehi.edu.au="">, Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <496E25FE.1020003 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Steve, The question wasn't really asked of me, but Gordon is likely in bed right now ;-D Steve Lianoglou wrote: > Hi Gordon, > > As someone who has been dealing more and more with raw data, I always > appreciate detailed answers from the masters, such as the one you just > wrote. Even after reading several of the published articles regarding > these normalization practices, I always find these less formal emails > quite helpful. > > That said, one point you mention isn't exactly clear to me, and I'm > wondering if you could elaborate just a bit here: > >> Filtering non-expressed probes tends not be emphasised on this list >> because users of this list are often sophisticated enough to use >> variance stabilizing normalization methods such as rma, vsn, normexp >> or vst. This means that low-expression filtering is done more for >> multiplicity issues than for variance stabilization, and therefore >> often doesn't make a huge difference. When using earlier >> normalization methods such as MAS for Affy or local background >> correction for two-color arrays, expression-filtering is absolutely >> essential, because the normalized expression values are so unstable >> at low intensity levels. > > > When you say "... low-expression filtering is done more for > multiplicity issues than for variance stabilization", what exactly do > you mean by "multiplicity issues"? By multiplicity issues Gordon was referring to the multiple comparisons problem. A p-value is an estimate of the probability of a type 1 error, in which we say there is a difference when in fact there isn't (a false positive). If we reject the null hypothesis at an alpha level of 0.05, we are in essence taking a 5% chance of being wrong. For one test this isn't a problem, but as you make more and more tests simultaneously, you expect to see more and more false positives (e.g, if you do 20 tests at an alpha of 0.05, and there are really no differences for any of the tests, you still expect about one of them to appear significant even though none are). There are lots of ways to adjust for multiple comparisons, but one of the best things you can do is not make so many comparisons in the first place, by filtering out data based on one or more criteria. Best, Jim > > Thanks, > -steve > > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology Weill > Medical College of Cornell University > > http://cbio.mskcc.org/~lianos > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-5646 734-936-8662 ------------------------------ Message: 5 Date: Thu, 15 Jan 2009 10:49:32 +0000 From: Daniel Brewer <daniel.brewer@icr.ac.uk> Subject: Re: [BioC] Filtering before differential expression analysis of microarrays - New paper out To: Gordon Smyth <smyth at="" wehi.edu.au=""> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <496F14BC.4040601 at icr.ac.uk> Content-Type: text/plain; charset=ISO-8859-1 Thanks for the brilliant answer. Very interesting stuff. The only other question I would like to ask concerning this is when do you define a probe as non-expressed? Is this done by observation of some kind of plot e.g. MA plot, a fixed percentage of probes or some absolute value known by experience. For Affy arrays you can use the DaBG results but I am not sure what the correct approach would be with two colour microarrays. Many thanks Dan Gordon Smyth wrote: > Dear Dan, > > It's very common practice to keep all the probes for normalization, then > to filter control probes and consistently non-expressed probes before > differential expression analysis. I recommend and do it this myself. > It's such common practice that it's surprising to see a paper on it at > this stage. > > It is in the spirit of normalization methods that all probes should be > retained for normalization, except in unusual cases in which some probes > are obviously poor quality for reasons other than expression level. > > At the differential expression step, probes can be usefully filtered out > if they are not of any potential interest. This means control probes, > or probes which appear to be non-expressed across all conditions in the > experiment, i.e., on all arrays. I have frequently complained on this > mailing list about the practice of filtering individual low intensity > probes on individual arrays, which IMO is a very destructive practice. > If you filter a probe on the basis of expression, it must be filtered on > all arrays. > > Filtering non-expressed probes tends not be emphasised on this list > because users of this list are often sophisticated enough to use > variance stabilizing normalization methods such as rma, vsn, normexp or > vst. This means that low-expression filtering is done more for > multiplicity issues than for variance stabilization, and therefore often > doesn't make a huge difference. When using earlier normalization > methods such as MAS for Affy or local background correction for > two-color arrays, expression-filtering is absolutely essential, because > the normalized expression values are so unstable at low intensity levels. > > To James, it is not necessary to give retain all the probes on the array > for eBayes(). The only requirement is that eBayes() sees all the probes > which are under consideration for differential expression. So filtering > out consistently non-expressed probes before linear modelling is > generally a good idea. In fact, filtering often improves the eBayes() > assumptions. eBayes assumes that the residual variances are not > intensity-dependent. However very lowly expressed probes often follow a > mean-variance relationship which is somewhat different from the other > probes, even after variance stabilization, in which case filtering will > improve the constancy of variance assumption. This tends not to be a > big issue with rma-Affy data, but it is an important issue with > vst-Illumina data for example. > > Best wishes > Gordon -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis MUCRC 15 Cotswold Road Sutton, Surrey SM2 5NG United Kingdom Tel: +44 (0) 20 8722 4109 Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:18}}

Biophysics Alignment Annotation Normalization Cancer hgu133plus2 probe affy vsn MBCB vsn • 1.3k views

ADD COMMENT • link 16.1 years ago elliot harrison ▴ 230

Login before adding your answer.