Entering edit mode
Hi,
I would like to follow up with a question to Ben on this (old) thread.
You saying below that the time complexity of the fitting algorithm is
*not* symmetric in probes and samples - having many samples is not as
bad as having many probes. I can guess a few reasons for why this is,
but that is of less interest, so my main question is: Because the
log-additive model is symmetric in probes and samples, can you just
swap samples and probes, i.e. transposing the matrix, and fit the
model, and then afterwards swap chip-effect with probe-effect
estimates? ...or are there asymmetries in how the algorithm is fitted
that breaks this idea, e.g. re-iterative robustifications etc.?
Thanks in advance
Henrik
On 5/1/07, Ben Bolstad <bmb at="" bmbolstad.com=""> wrote:
> The slowdown you are observing is due to just a few probesets on the
> array. These probesets contain many 1000's of probes. In the current
> implementation when you use the command that you specified (fitting
the
> default model) fitPLM uses a procedure optimized for probesets with
> relatively few probes across many arrays and so is pretty quick most
of
> the time (my experience is that is is not completely unacceptable
even
> up to about 1000 probes across a large number of arrays, at least on
my
> machine).
>
> eg both of the following contain same number of datapoints
>
> Case I: 11 probes and 1000 arrays
> Case II: 1000 probes and 11 probes
>
> but case I will be a lot quicker than case II in the current
> implementation.
>
> Demonstration code
>
> > library(affyPLM)
>
> ### note to any developers out there, the following is UNSUPPORTED
> ### and subject to change. DO NOT USE.
> > rlm.default.rma.model <- function(y,PsiCode=0,PsiK=1.345){
> +
.Call("R_rlm_rma_default_model",y,PsiCode,PsiK,PACKAGE="affyPLM")
> + }
>
> #Case I
> > y <- matrix(rnorm(11*1000),11,1000)
> > system.time(test <- rlm.default.rma.model(y))
> [1] 0.735 0.032 0.788 0.000 0.000
>
> #Case II
> > y <- matrix(rnorm(11*1000),1000,11)
> > system.time(test <- rlm.default.rma.model(y))
> [1] 19.776 0.508 21.730 0.000 0.000
>
> As for workarounds, I am pretty sure that these extremely large
> probesets are control probesets of some kind that could be safely
> ignored and it is possible to pass a vector of probeset names
specifying
> a subset to use for fitPLM.
>
> Best,
>
> Ben
>
> On Tue, 2007-05-01 at 12:36 -0700, Allen Day wrote:
> > I suspect so, although I haven't tried running rma() directly.
> > Just.rma() works fine, and fitPLM is able to RMA normalize
internally.
> >
> > I was able to move this a little further along by patching the
mm()
> > function to return empty list in the case of a dimensionless pset
> > variable. Apparently it is usually a two-column matrix with pm in
> > psets[,1] and mm in psets[,2]. Heres the patch.
> > http://paste.turbogears.org/paste/1253/plain
> >
> > This allows me to successfully background correct and normalize
with
> > RMA through wrapper function fitPLM from the affyPLM library.
It's
> > taking forever though, even running with minimal options. Here's
my
> > call:
> >
> > fitPLM(ab, output.param=list(residuals=FALSE,weights=FALSE,resid.S
E=FALSE),verbosity.level=10);
> >
> > Any advice?
> >
> > -Allen
> >
> > On 5/1/07, Crispin Miller <cmiller at="" picr.man.ac.uk=""> wrote:
> > > Hi Allen,
> > > Does rma() work with your cdf?
> > >
> > > We've also produced one that works OK with rma() (see the
'exonmap'
> > > package vignette for more details, including how to get it).
Don't know
> > > if that helps?
> > >
> > > Crispin
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: bioconductor-bounces at stat.math.ethz.ch
> > > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf
Of Allen Day
> > > > Sent: 01 May 2007 01:32
> > > > To: bioconductor at stat.math.ethz.ch
> > > > Subject: [BioC] affyPLM and exon array question
> > > >
> > > > Hi,
> > > >
> > > > I've been trying to get NUSE, RLE, and RMA values for
> > > > HuEx-1_0-st-v2 (Human "all exon") Affymetrix arrays.
> > > >
> > > > So far I have successfully read the arrays into an affybatch
object.
> > > > This required creating the CDF environment, which I have
> > > > already done with makecdfenv. I'll be submitting that for
> > > > inclusion shortly, but that's another topic.
> > > >
> > > > After creating the AffyBatch, I try to use affyPLM to do an
> > > > RMA model fit. R = 2.4.1, affyPLM = 1.12.0, affy = 1.12.2.
> > > > That's where there's trouble, and it appears to be caused by
> > > > the lack of mismatch probes on the array. Here's code
> > > > illustrating the problem:
> > > >
> > > > > library( 'affy' );
> > > > > library( 'affyPLM' );
> > > > > ab = read.affybatch(
> > > > filenames='/home/allenday/cel/0001.CEL' ); ab; #
> > > > > works, output omitted pm( ab ); # works, output omitted mm(
ab ); #
> > > > > fails!
> > > > Error in FUN(X[[1411190]], ...) : subscript out of bounds
> > > > > plm = fitPLM( ab ); #same failure in fitPLM, caused by a
> > > > call to mm()
> > > > > on variable ab;
> > > > Error in FUN(X[[1411190]], ...) : subscript out of bounds
> > > >
> > > > I'm only proficient enough in R and C to track this down --
> > > > I'm don't know R or Bioconductor well enough to know how to
> > > > fix it. If I can get this going I will submit a new package
> > > > that provides just.nuse() and just.rle() functions. Can
> > > > someone give me a pointer for how to make this work?
> > > >
> > > > Thanks.
> > > >
> > > > -Allen
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > Search the archives:
> > > >
http://news.gmane.org/gmane.science.biology.informatics.conductor
> > > >
> > >
> > > --------------------------------------------------------
> > >
> > >
> > > This email is confidential and intended solely for the use of
the person(s) ('the intended recipient') to whom it was addressed. Any
views or opinions presented are solely those of the author and do not
necessarily represent those of the Paterson Institute for Cancer
Research or the University of Manchester. It may contain information
that is privileged & confidential within the meaning of applicable
law. Accordingly any dissemination, distribution, copying, or other
use of this message, or any of its contents, by any person other than
the intended recipient may constitute a breach of civil or criminal
law and is strictly prohibited. If you are NOT the intended recipient
please contact the sender and dispose of this e-mail as soon as
possible.
> > >
> > >
> --
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>