affyPLM and exon array question

0

Entering edit mode

Allen Day ▴ 30

@allen-day-2139

Last seen 10.7 years ago

Hi, I've been trying to get NUSE, RLE, and RMA values for HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays. So far I have successfully read the arrays into an affybatch object. This required creating the CDF environment, which I have already done with makecdfenv. I'll be submitting that for inclusion shortly, but that's another topic. After creating the AffyBatch, I try to use affyPLM to do an RMA model fit. R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2. That's where there's trouble, and it appears to be caused by the lack of mismatch probes on the array. Here's code illustrating the problem: > library( 'affy' ); > library( 'affyPLM' ); > ab = read.affybatch( filenames='/home/allenday/cel/0001.CEL' ); > ab; # works, output omitted > pm( ab ); # works, output omitted > mm( ab ); # fails! Error in FUN(X[[1411190]], ...) : subscript out of bounds > plm = fitPLM( ab ); #same failure in fitPLM, caused by a call to mm() on variable ab; Error in FUN(X[[1411190]], ...) : subscript out of bounds I'm only proficient enough in R and C to track this down -- I'm don't know R or Bioconductor well enough to know how to fix it. If I can get this going I will submit a new package that provides just.nuse() and just.rle() functions. Can someone give me a pointer for how to make this work? Thanks. -Allen

cdf affy makecdfenv affyPLM cdf affy makecdfenv affyPLM • 2.1k views

ADD COMMENT • link updated 18.0 years ago by Ben Bolstad ★ 1.2k • written 18.0 years ago by Allen Day ▴ 30

0

Entering edit mode

Crispin Miller ★ 1.1k

@crispin-miller-264

Last seen 10.7 years ago

Hi Allen, Does rma() work with your cdf? We've also produced one that works OK with rma() (see the 'exonmap' package vignette for more details, including how to get it). Don't know if that helps? Crispin > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Allen Day > Sent: 01 May 2007 01:32 > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] affyPLM and exon array question > > Hi, > > I've been trying to get NUSE, RLE, and RMA values for > HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays. > > So far I have successfully read the arrays into an affybatch object. > This required creating the CDF environment, which I have > already done with makecdfenv. I'll be submitting that for > inclusion shortly, but that's another topic. > > After creating the AffyBatch, I try to use affyPLM to do an > RMA model fit. R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2. > That's where there's trouble, and it appears to be caused by > the lack of mismatch probes on the array. Here's code > illustrating the problem: > > > library( 'affy' ); > > library( 'affyPLM' ); > > ab = read.affybatch( > filenames='/home/allenday/cel/0001.CEL' ); ab; # > > works, output omitted pm( ab ); # works, output omitted mm( ab ); # > > fails! > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > plm = fitPLM( ab ); #same failure in fitPLM, caused by a > call to mm() > > on variable ab; > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > I'm only proficient enough in R and C to track this down -- > I'm don't know R or Bioconductor well enough to know how to > fix it. If I can get this going I will submit a new package > that provides just.nuse() and just.rle() functions. Can > someone give me a pointer for how to make this work? > > Thanks. > > -Allen > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}

ADD COMMENT • link 18.0 years ago Crispin Miller ★ 1.1k

0

Entering edit mode

I suspect so, although I haven't tried running rma() directly. Just.rma() works fine, and fitPLM is able to RMA normalize internally. I was able to move this a little further along by patching the mm() function to return empty list in the case of a dimensionless pset variable. Apparently it is usually a two-column matrix with pm in psets[,1] and mm in psets[,2]. Heres the patch. http://paste.turbogears.org/paste/1253/plain This allows me to successfully background correct and normalize with RMA through wrapper function fitPLM from the affyPLM library. It's taking forever though, even running with minimal options. Here's my call: fitPLM(ab, output.param=list(residuals=FALSE,weights=FALSE,resid.SE=FA LSE),verbosity.level=10); Any advice? -Allen On 5/1/07, Crispin Miller <cmiller at="" picr.man.ac.uk=""> wrote: > Hi Allen, > Does rma() work with your cdf? > > We've also produced one that works OK with rma() (see the 'exonmap' > package vignette for more details, including how to get it). Don't know > if that helps? > > Crispin > > > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Allen Day > > Sent: 01 May 2007 01:32 > > To: bioconductor at stat.math.ethz.ch > > Subject: [BioC] affyPLM and exon array question > > > > Hi, > > > > I've been trying to get NUSE, RLE, and RMA values for > > HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays. > > > > So far I have successfully read the arrays into an affybatch object. > > This required creating the CDF environment, which I have > > already done with makecdfenv. I'll be submitting that for > > inclusion shortly, but that's another topic. > > > > After creating the AffyBatch, I try to use affyPLM to do an > > RMA model fit. R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2. > > That's where there's trouble, and it appears to be caused by > > the lack of mismatch probes on the array. Here's code > > illustrating the problem: > > > > > library( 'affy' ); > > > library( 'affyPLM' ); > > > ab = read.affybatch( > > filenames='/home/allenday/cel/0001.CEL' ); ab; # > > > works, output omitted pm( ab ); # works, output omitted mm( ab ); # > > > fails! > > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > > plm = fitPLM( ab ); #same failure in fitPLM, caused by a > > call to mm() > > > on variable ab; > > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > > > I'm only proficient enough in R and C to track this down -- > > I'm don't know R or Bioconductor well enough to know how to > > fix it. If I can get this going I will submit a new package > > that provides just.nuse() and just.rle() functions. Can > > someone give me a pointer for how to make this work? > > > > Thanks. > > > > -Allen > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -------------------------------------------------------- > > > This email is confidential and intended solely for the use...{{dropped}}

ADD REPLY • link 18.0 years ago Allen Day ▴ 30

0

Entering edit mode

Ben Bolstad ★ 1.2k

@ben-bolstad-1494

Last seen 7.7 years ago

The slowdown you are observing is due to just a few probesets on the array. These probesets contain many 1000's of probes. In the current implementation when you use the command that you specified (fitting the default model) fitPLM uses a procedure optimized for probesets with relatively few probes across many arrays and so is pretty quick most of the time (my experience is that is is not completely unacceptable even up to about 1000 probes across a large number of arrays, at least on my machine). eg both of the following contain same number of datapoints Case I: 11 probes and 1000 arrays Case II: 1000 probes and 11 probes but case I will be a lot quicker than case II in the current implementation. Demonstration code > library(affyPLM) ### note to any developers out there, the following is UNSUPPORTED ### and subject to change. DO NOT USE. > rlm.default.rma.model <- function(y,PsiCode=0,PsiK=1.345){ + .Call("R_rlm_rma_default_model",y,PsiCode,PsiK,PACKAGE="affyPLM") + } #Case I > y <- matrix(rnorm(11*1000),11,1000) > system.time(test <- rlm.default.rma.model(y)) [1] 0.735 0.032 0.788 0.000 0.000 #Case II > y <- matrix(rnorm(11*1000),1000,11) > system.time(test <- rlm.default.rma.model(y)) [1] 19.776 0.508 21.730 0.000 0.000 As for workarounds, I am pretty sure that these extremely large probesets are control probesets of some kind that could be safely ignored and it is possible to pass a vector of probeset names specifying a subset to use for fitPLM. Best, Ben On Tue, 2007-05-01 at 12:36 -0700, Allen Day wrote: > I suspect so, although I haven't tried running rma() directly. > Just.rma() works fine, and fitPLM is able to RMA normalize internally. > > I was able to move this a little further along by patching the mm() > function to return empty list in the case of a dimensionless pset > variable. Apparently it is usually a two-column matrix with pm in > psets[,1] and mm in psets[,2]. Heres the patch. > http://paste.turbogears.org/paste/1253/plain > > This allows me to successfully background correct and normalize with > RMA through wrapper function fitPLM from the affyPLM library. It's > taking forever though, even running with minimal options. Here's my > call: > > fitPLM(ab, output.param=list(residuals=FALSE,weights=FALSE,resid.SE= FALSE),verbosity.level=10); > > Any advice? > > -Allen > > On 5/1/07, Crispin Miller <cmiller at="" picr.man.ac.uk=""> wrote: > > Hi Allen, > > Does rma() work with your cdf? > > > > We've also produced one that works OK with rma() (see the 'exonmap' > > package vignette for more details, including how to get it). Don't know > > if that helps? > > > > Crispin > > > > > > > > > > > -----Original Message----- > > > From: bioconductor-bounces at stat.math.ethz.ch > > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Allen Day > > > Sent: 01 May 2007 01:32 > > > To: bioconductor at stat.math.ethz.ch > > > Subject: [BioC] affyPLM and exon array question > > > > > > Hi, > > > > > > I've been trying to get NUSE, RLE, and RMA values for > > > HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays. > > > > > > So far I have successfully read the arrays into an affybatch object. > > > This required creating the CDF environment, which I have > > > already done with makecdfenv. I'll be submitting that for > > > inclusion shortly, but that's another topic. > > > > > > After creating the AffyBatch, I try to use affyPLM to do an > > > RMA model fit. R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2. > > > That's where there's trouble, and it appears to be caused by > > > the lack of mismatch probes on the array. Here's code > > > illustrating the problem: > > > > > > > library( 'affy' ); > > > > library( 'affyPLM' ); > > > > ab = read.affybatch( > > > filenames='/home/allenday/cel/0001.CEL' ); ab; # > > > > works, output omitted pm( ab ); # works, output omitted mm( ab ); # > > > > fails! > > > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > > > plm = fitPLM( ab ); #same failure in fitPLM, caused by a > > > call to mm() > > > > on variable ab; > > > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > > > > > I'm only proficient enough in R and C to track this down -- > > > I'm don't know R or Bioconductor well enough to know how to > > > fix it. If I can get this going I will submit a new package > > > that provides just.nuse() and just.rle() functions. Can > > > someone give me a pointer for how to make this work? > > > > > > Thanks. > > > > > > -Allen > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > -------------------------------------------------------- > > > > > > This email is confidential and intended solely for the use of the person(s) ('the intended recipient') to whom it was addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Paterson Institute for Cancer Research or the University of Manchester. It may contain information that is privileged & confidential within the meaning of applicable law. Accordingly any dissemination, distribution, copying, or other use of this message, or any of its contents, by any person other than the intended recipient may constitute a breach of civil or criminal law and is strictly prohibited. If you are NOT the intended recipient please contact the sender and dispose of this e-mail as soon as possible. > > > > --

ADD COMMENT • link 18.0 years ago Ben Bolstad ★ 1.2k

0

Entering edit mode

Thanks Ben, this is very helpful. I'll try excluding the background probesets to see if that speeds things up. I'll post my findings here. -Allen On 5/1/07, Ben Bolstad <bmb at="" bmbolstad.com=""> wrote: > The slowdown you are observing is due to just a few probesets on the > array. These probesets contain many 1000's of probes. In the current > implementation when you use the command that you specified (fitting the > default model) fitPLM uses a procedure optimized for probesets with > relatively few probes across many arrays and so is pretty quick most of > the time (my experience is that is is not completely unacceptable even > up to about 1000 probes across a large number of arrays, at least on my > machine). > > eg both of the following contain same number of datapoints > > Case I: 11 probes and 1000 arrays > Case II: 1000 probes and 11 probes > > but case I will be a lot quicker than case II in the current > implementation. > > Demonstration code > > > library(affyPLM) > > ### note to any developers out there, the following is UNSUPPORTED > ### and subject to change. DO NOT USE. > > rlm.default.rma.model <- function(y,PsiCode=0,PsiK=1.345){ > + .Call("R_rlm_rma_default_model",y,PsiCode,PsiK,PACKAGE="affyPLM") > + } > > #Case I > > y <- matrix(rnorm(11*1000),11,1000) > > system.time(test <- rlm.default.rma.model(y)) > [1] 0.735 0.032 0.788 0.000 0.000 > > #Case II > > y <- matrix(rnorm(11*1000),1000,11) > > system.time(test <- rlm.default.rma.model(y)) > [1] 19.776 0.508 21.730 0.000 0.000 > > As for workarounds, I am pretty sure that these extremely large > probesets are control probesets of some kind that could be safely > ignored and it is possible to pass a vector of probeset names specifying > a subset to use for fitPLM. > > Best, > > Ben > > On Tue, 2007-05-01 at 12:36 -0700, Allen Day wrote: > > I suspect so, although I haven't tried running rma() directly. > > Just.rma() works fine, and fitPLM is able to RMA normalize internally. > > > > I was able to move this a little further along by patching the mm() > > function to return empty list in the case of a dimensionless pset > > variable. Apparently it is usually a two-column matrix with pm in > > psets[,1] and mm in psets[,2]. Heres the patch. > > http://paste.turbogears.org/paste/1253/plain > > > > This allows me to successfully background correct and normalize with > > RMA through wrapper function fitPLM from the affyPLM library. It's > > taking forever though, even running with minimal options. Here's my > > call: > > > > fitPLM(ab, output.param=list(residuals=FALSE,weights=FALSE,resid.S E=FALSE),verbosity.level=10); > > > > Any advice? > > > > -Allen > > > > On 5/1/07, Crispin Miller <cmiller at="" picr.man.ac.uk=""> wrote: > > > Hi Allen, > > > Does rma() work with your cdf? > > > > > > We've also produced one that works OK with rma() (see the 'exonmap' > > > package vignette for more details, including how to get it). Don't know > > > if that helps? > > > > > > Crispin > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Allen Day > > > > Sent: 01 May 2007 01:32 > > > > To: bioconductor at stat.math.ethz.ch > > > > Subject: [BioC] affyPLM and exon array question > > > > > > > > Hi, > > > > > > > > I've been trying to get NUSE, RLE, and RMA values for > > > > HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays. > > > > > > > > So far I have successfully read the arrays into an affybatch object. > > > > This required creating the CDF environment, which I have > > > > already done with makecdfenv. I'll be submitting that for > > > > inclusion shortly, but that's another topic. > > > > > > > > After creating the AffyBatch, I try to use affyPLM to do an > > > > RMA model fit. R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2. > > > > That's where there's trouble, and it appears to be caused by > > > > the lack of mismatch probes on the array. Here's code > > > > illustrating the problem: > > > > > > > > > library( 'affy' ); > > > > > library( 'affyPLM' ); > > > > > ab = read.affybatch( > > > > filenames='/home/allenday/cel/0001.CEL' ); ab; # > > > > > works, output omitted pm( ab ); # works, output omitted mm( ab ); # > > > > > fails! > > > > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > > > > plm = fitPLM( ab ); #same failure in fitPLM, caused by a > > > > call to mm() > > > > > on variable ab; > > > > Error in FUN(X[[1411190]], ...) : subscript out of bounds > > > > > > > > I'm only proficient enough in R and C to track this down -- > > > > I'm don't know R or Bioconductor well enough to know how to > > > > fix it. If I can get this going I will submit a new package > > > > that provides just.nuse() and just.rle() functions. Can > > > > someone give me a pointer for how to make this work? > > > > > > > > Thanks. > > > > > > > > -Allen > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > -------------------------------------------------------- > > > > > > > > > This email is confidential and intended solely for the use of the person(s) ('the intended recipient') to whom it was addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Paterson Institute for Cancer Research or the University of Manchester. It may contain information that is privileged & confidential within the meaning of applicable law. Accordingly any dissemination, distribution, copying, or other use of this message, or any of its contents, by any person other than the intended recipient may constitute a breach of civil or criminal law and is strictly prohibited. If you are NOT the intended recipient please contact the sender and dispose of this e-mail as soon as possible. > > > > > > > -- > >

ADD REPLY • link 18.0 years ago Allen Day ▴ 30

Login before adding your answer.