Masking single Oligos in mas5

0

Entering edit mode

Benjamin Otto ▴ 830

@benjamin-otto-1519

Last seen 10.2 years ago

Hi, I have been searching for nearly two days for a solution to the following problem without finding satisfactory answers: I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for certain probesets single Oligos such that the expression, p and fold change values are calculated based on the remaining oligos? A better description of my problem and the background. We are handling a cross-species experiment having hybradized rna from tupaia on the human chip. This resulted in fairly low expression signals. If we just forget about all the other putative problems in analysing the result I think it seems reasonable to say, that in many cases only some of the probeset oligos will have hybridized satisfyingly. So the idea is masking some of the oligos by some criteria and calculate the results only based upon subsets of the probesets. The problem is: If I set even only one single oligo to NA, the values calculated for the corresponding probeset won't be calculated but set to NA. Most of the threads I found concerning the masking problem handle the question of an autpmated or corrected form of masking. But there seems to be no available information about our case. Has anyone done something like that before? I'm sure there will have to be some manual programing. But the major question is: Does anybody see a possibility to mask the single oligos on a top level like fixing the affybatch structure? Or do I have to change every single function to treat NA values in the correct form? thanks for your help, Benjamin

oligo oligo • 1.5k views

ADD COMMENT • link updated 19.0 years ago by Jenny Drnevich ★ 2.2k • written 19.0 years ago by Benjamin Otto ▴ 830

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 10.2 years ago

See the three posts by Ariel Chernomoretz <ariel.chernomoretz at="" crchul.ulaval.ca=""> on 20-21 April, 2005. I've been using these codes to remove entire probe sets. Thanks Ariel!! Jenny At 10:06 AM 12/1/2005, Benjamin Otto wrote: >Hi, > >I have been searching for nearly two days for a solution to the following >problem without finding satisfactory answers: > >I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for certain >probesets single Oligos such that the expression, p and fold change values >are calculated based on the remaining oligos? > >A better description of my problem and the background. We are handling a >cross-species experiment having hybradized rna from tupaia on the human >chip. This resulted in fairly low expression signals. If we just forget >about all the other putative problems in analysing the result I think it >seems reasonable to say, that in many cases only some of the probeset oligos >will have hybridized satisfyingly. So the idea is masking some of the oligos >by some criteria and calculate the results only based upon subsets of the >probesets. The problem is: If I set even only one single oligo to NA, the >values calculated for the corresponding probeset won't be calculated but set >to NA. Most of the threads I found concerning the masking problem handle the >question of an autpmated or corrected form of masking. But there seems to be >no available information about our case. Has anyone done something like that >before? I'm sure there will have to be some manual programing. But the major >question is: Does anybody see a possibility to mask the single oligos on a >top level like fixing the affybatch structure? Or do I have to change every >single function to treat NA values in the correct form? > >thanks for your help, >Benjamin > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 19.0 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Ariel Chernomoretz ▴ 380

@ariel-chernomoretz-885

Last seen 10.2 years ago

Hi Benjamin, Some time ago I wrote a couple of functions that modified the cdf environments in order to remove bad probes from the affy analysis You could check: http://files.protsuggest.org/biocond/html/7350.html http://files.protsuggest.org/biocond/html/7366.html http://files.protsuggest.org/biocond/html/7367.html hope that helps. ariel./ On December 1, 2005 11:06 am, Benjamin Otto wrote: > Hi, > > I have been searching for nearly two days for a solution to the following > problem without finding satisfactory answers: > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for > certain probesets single Oligos such that the expression, p and fold change > values are calculated based on the remaining oligos? > > A better description of my problem and the background. We are handling a > cross-species experiment having hybradized rna from tupaia on the human > chip. This resulted in fairly low expression signals. If we just forget > about all the other putative problems in analysing the result I think it > seems reasonable to say, that in many cases only some of the probeset > oligos will have hybridized satisfyingly. So the idea is masking some of > the oligos by some criteria and calculate the results only based upon > subsets of the probesets. The problem is: If I set even only one single > oligo to NA, the values calculated for the corresponding probeset won't be > calculated but set to NA. Most of the threads I found concerning the > masking problem handle the question of an autpmated or corrected form of > masking. But there seems to be no available information about our case. Has > anyone done something like that before? I'm sure there will have to be some > manual programing. But the major question is: Does anybody see a > possibility to mask the single oligos on a top level like fixing the > affybatch structure? Or do I have to change every single function to treat > NA values in the correct form? > > thanks for your help, > Benjamin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Ariel Chernomoretz, Ph.D. Centre de recherche du CHUL 2705 Blv Laurier, bloc T-367 Sainte-Foy, Qc G1V 4G2 (418)-525-4444 ext 46339

ADD COMMENT • link 19.0 years ago Ariel Chernomoretz ▴ 380

0

Entering edit mode

Hi Ariel and Jenny, thanks very much for the quick reply!!! I have been experimenting since with the functions and it really seems to be what I'm looking for. A small test with your test_script Ariel returned what I hoped for. However there is one aspect confusing me. After writing a little function myself, which computes the probe and probeset list which should be excluded in my case I applied the removeProbes function on these lists. While I'm writing this message my computer is still calculating and he has started four hours ago! Is it normal for the function to be so time consuming or is there some chance that I just have done something wrong? I suppose the problem is on my side but I just can't imagine what it could be because I followed the "testscript" steps. But as the reading of the CEL files usually needs just a few minutes (maybe five if time consuming) I wouldn't have thought the removing of probesets would take as long as it currently does. I must confess my lists are really long. Still the problem is of big importance for me and so it would be great to get an idea of your experience with the function perfomance on your systems. Here is a little summary of the conditions in my case: Chiptype: HG_U133_Plus2 num. samples: 2 num. Probes in listOutProbes: ~ 500.000 num. Sets in listOutProbeSets: ~ 30.000 System: P4 - 3 GHZ, 500 MB RAM, R-version: RGUI 2.2.0 under WindowsXP. Mainly I have been wondering that the removing of probes from the structure takes so much longer than reading in the CEL files and building the structure. Has anyone of you maybe started a test before trying to remove all probesets from an environment? Or can you at least judge wether the current running time in my case is ok or looks suspicious? In any case just to make it clear again, thanks very much in any case. The functions really seem to be of very, very much help!!! :-) Sincerly, Benjamin -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel Chernomoretz Sent: 01 December 2005 17:32 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Masking single Oligos in mas5 Hi Benjamin, Some time ago I wrote a couple of functions that modified the cdf environments in order to remove bad probes from the affy analysis You could check: http://files.protsuggest.org/biocond/html/7350.html http://files.protsuggest.org/biocond/html/7366.html http://files.protsuggest.org/biocond/html/7367.html hope that helps. ariel./ On December 1, 2005 11:06 am, Benjamin Otto wrote: > Hi, > > I have been searching for nearly two days for a solution to the following > problem without finding satisfactory answers: > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for > certain probesets single Oligos such that the expression, p and fold change > values are calculated based on the remaining oligos? > > A better description of my problem and the background. We are handling a > cross-species experiment having hybradized rna from tupaia on the human > chip. This resulted in fairly low expression signals. If we just forget > about all the other putative problems in analysing the result I think it > seems reasonable to say, that in many cases only some of the probeset > oligos will have hybridized satisfyingly. So the idea is masking some of > the oligos by some criteria and calculate the results only based upon > subsets of the probesets. The problem is: If I set even only one single > oligo to NA, the values calculated for the corresponding probeset won't be > calculated but set to NA. Most of the threads I found concerning the > masking problem handle the question of an autpmated or corrected form of > masking. But there seems to be no available information about our case. Has > anyone done something like that before? I'm sure there will have to be some > manual programing. But the major question is: Does anybody see a > possibility to mask the single oligos on a top level like fixing the > affybatch structure? Or do I have to change every single function to treat > NA values in the correct form? > > thanks for your help, > Benjamin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Ariel Chernomoretz, Ph.D. Centre de recherche du CHUL 2705 Blv Laurier, bloc T-367 Sainte-Foy, Qc G1V 4G2 (418)-525-4444 ext 46339 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD REPLY • link 19.0 years ago Benjamin Otto ▴ 830

0

Entering edit mode

Hi folks, sorry for taking your time again. I let the computer finish the "removeProbes"-calculation over night, so I don't know how long he took...to finally display the following error message: "Error in ans[[i]][, i.probes] : incorrect number of dimensions" You can imagine how desperately I was looking for a shotgun that moment. To provide maybe more information I forgot to submit in my last thread here come the commands and extracts of my lists / structures I used: $# libraries used $> library(simpleaffy) $> library(affydata) $> require(gcrma) $ $# preliminary commands $> setwd ("....") $> x <- read.affy() $> x at cdfName <- "HGU133Plus2" $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE) $> cdfpackagename <- paste(cleancdf,"cdf",sep="") $> probepackagename <- paste(cleancdf,"probe",sep="") $> ResetEnvir(cdfpackagename,probepackagename) $ $# my function for indentification of unwanted oligos $# I will append the function code below $> rmd <- get.foul.oligos(x, ratio=2, diff=100) $ $# the initial rma/mas calculation without changes $> d.rma <- rma(x) $> d.gcrma <- gcrma(x) $> d.mas <- mas5(x) $> d.mas.call <- mas5calls(x) $ $# The command for invoking the removeProbes function $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename) The probe list "rmd$Probes" contains 565825 entries, the set list "rmd$Set" 32791 entries. A little structure check: $> rmd$Set[1:5] $[1] "1007_s_at" "1053_at" "1255_g_at" "1405_i_at" "1438_at" $> rmd$Probes[1:5] $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" "1007_s_at5" Looks fine to me, but is it? What could cause the error message here ? Here comes the code of my function: get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) { remove <- list() remove$Probes <- list() remove$Set <- list() len.plist <- 0 len.slist <- 0 ids <- geneNames(x) num.ids <- length(ids) num.samples <- length(pm(x)[1,]) msk.table <- matrix(NA,num.ids,2) rm.string <- NA for (i in 1:num.ids) { id <- ids[i] oligos.pm <- indexProbes(x, which="pm", genenames=id)[[id]] oligos.mm <- indexProbes(x, which="mm", genenames=id)[[id]] num.oligos <- lengtholigos.pm) num.rm <- 0 for (j in 1:num.oligos) { cat ("i:",i," j:",j,"\n") rm.flag <- TRUE for (k in 1:num.samples) { val.pm <- intensity(x)[oligos.pm[j],k] val.mm <- intensity(x)[oligos.mm[j],k] if val.pm/val.mm > ratio & val.pm- val.mm > diff) { rm.flag <- FALSE } } if (rm.flag == TRUE) { num.rm <- num.rm + 1 oligo.name <- paste(id,j,sep="") if (len.plist == 1) { remove$Probes <- c(remove$Probes,oligo.name) } else { remove$Probes <- coligo.name) } len.plist <- 1 if is.na(rm.string)) { rm.string <- j } else { rm.string <- paste(rm.string,j,sep=",") } } } if (num.rm == num.oligos) { if (len.slist == 1) { remove$Set <- c(remove$Set,id) } else { remove$Set <- c(id) } len.slist <- 1 rm.string <- "all" } } return(remove) } > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin > Otto > Sent: 05 December 2005 17:14 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Masking single Oligos in mas5 > > > Hi Ariel and Jenny, > > thanks very much for the quick reply!!! I have been experimenting > since with > the functions and it really seems to be what I'm looking for. A small test > with your test_script Ariel returned what I hoped for. However > there is one > aspect confusing me. After writing a little function myself, > which computes > the probe and probeset list which should be excluded in my case I applied > the removeProbes function on these lists. While I'm writing this > message my > computer is still calculating and he has started four hours ago! Is it > normal for the function to be so time consuming or is there some > chance that > I just have done something wrong? I suppose the problem is on my > side but I > just can't imagine what it could be because I followed the "testscript" > steps. But as the reading of the CEL files usually needs just a > few minutes > (maybe five if time consuming) I wouldn't have thought the removing of > probesets would take as long as it currently does. I must confess my lists > are really long. Still the problem is of big importance for me and so it > would be great to get an idea of your experience with the function > perfomance on your systems. Here is a little summary of the > conditions in my > case: > > Chiptype: HG_U133_Plus2 > num. samples: 2 > num. Probes in listOutProbes: ~ 500.000 > num. Sets in listOutProbeSets: ~ 30.000 > System: P4 - 3 GHZ, 500 MB RAM, > R-version: RGUI 2.2.0 under WindowsXP. > > Mainly I have been wondering that the removing of probes from the > structure > takes so much longer than reading in the CEL files and building the > structure. Has anyone of you maybe started a test before trying to remove > all probesets from an environment? Or can you at least judge wether the > current running time in my case is ok or looks suspicious? > > In any case just to make it clear again, thanks very much in any case. The > functions really seem to be of very, very much help!!! :-) > > Sincerly, > Benjamin > > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > Chernomoretz > Sent: 01 December 2005 17:32 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Masking single Oligos in mas5 > > > Hi Benjamin, > > Some time ago I wrote a couple of functions that modified > the cdf environments in order to remove bad probes from the affy analysis > > You could check: > > http://files.protsuggest.org/biocond/html/7350.html > http://files.protsuggest.org/biocond/html/7366.html > http://files.protsuggest.org/biocond/html/7367.html > > > hope that helps. > > ariel./ > > > On December 1, 2005 11:06 am, Benjamin Otto wrote: > > Hi, > > > > I have been searching for nearly two days for a solution to the > following > > problem without finding satisfactory answers: > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for > > certain probesets single Oligos such that the expression, p and fold > change > > values are calculated based on the remaining oligos? > > > > A better description of my problem and the background. We are handling a > > cross-species experiment having hybradized rna from tupaia on the human > > chip. This resulted in fairly low expression signals. If we just forget > > about all the other putative problems in analysing the result I think it > > seems reasonable to say, that in many cases only some of the probeset > > oligos will have hybridized satisfyingly. So the idea is masking some of > > the oligos by some criteria and calculate the results only based upon > > subsets of the probesets. The problem is: If I set even only one single > > oligo to NA, the values calculated for the corresponding > probeset won't be > > calculated but set to NA. Most of the threads I found concerning the > > masking problem handle the question of an autpmated or corrected form of > > masking. But there seems to be no available information about our case. > Has > > anyone done something like that before? I'm sure there will have to be > some > > manual programing. But the major question is: Does anybody see a > > possibility to mask the single oligos on a top level like fixing the > > affybatch structure? Or do I have to change every single > function to treat > > NA values in the correct form? > > > > thanks for your help, > > Benjamin > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- > Ariel Chernomoretz, Ph.D. > Centre de recherche du CHUL > 2705 Blv Laurier, bloc T-367 > Sainte-Foy, Qc > G1V 4G2 > (418)-525-4444 ext 46339 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.0 years ago Benjamin Otto ▴ 830

0

Entering edit mode

Hi Benjamin I am sorry you have problems with the code. Regarding the computation time, yes. It is slow even with small/mid sized lists, so in your case I would expect a loooot of processing time (more RAM would help also ;-) ). I guess that to improve this, the function should be completely rewritten using perhaps a different approach. However, once you succed in generating a modified cdf environment, and if you are planning to compute expressions many times having the same probes removed, you should keep (save) the modified cdf environment as an R object. I wrote a function (RemoveProbes2, included at the end) that takes the clean.cdf.environment as an input argument. So everything should be faster for subsequent calculations. Unfortunately I am not a lot at the office these days. My postdoc is about to end here, and I am moving in a couple of weeks, so things are a little bit caotic this days! However I will check your code asap. quick observations/questions: 1) are you removing entire probesets specifiyng the whole set of probes via the rmd$Probes list AND the rmd$Set? 2) if you plan to use presence calls, be cautious. After the probe removal step you end up with probeset with different number of probes so be carefull with the significance levels of the Wilcoxon'rank test Regards, Ariel./ ################################################ # clean.cdf: a modified cdf environment. Should be generated # using RemoveProbes and saving the modified # environment afterwards # cdfpackagename: (e.g. "hgu95av2cdf") # probepackagename: (e.g. "hgu95av2probe") # destructive: unimplemented option, see NOTE RemoveProbes2<-function(clean.cdfenv =NULL, cdfpackagename,probepackagename,destructive=TRUE){ #default probe dataset values probe.env.orig <- get(probepackagename) if(is.null(clean.cdfenv)){ cat("A cdfenv should be provided\n")\ return(); }else{ #clean.cdf provided. much, much faster! ipos<-grep(cdfpackagename,search()) rm(list=c(cdfpackagename),pos=ipos) assign(cdfpackagename,clean.cdfenv,pos=ipos) } # setting the PROBE env accordingly (idea from gcrma compute.affinities.R) tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) newAB <- new("AffyBatch",cdfName=cleancdf) pmIndex <- unlist(indexProbes(newAB,"pm")) subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) rm(newAB) iNA <- whichis.na(subIndex)) if(length(iNA)>0){ ipos<-grep(probepackagename,search()) assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) } } ############################################### On December 6, 2005 04:07 am, Benjamin Otto wrote: > Hi folks, > > sorry for taking your time again. I let the computer finish the > "removeProbes"-calculation over night, so I don't know how long he > took...to finally display the following error message: > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions" > > You can imagine how desperately I was looking for a shotgun that moment. To > provide maybe more information I forgot to submit in my last thread here > come the commands and extracts of my lists / structures I used: > > $# libraries used > $> library(simpleaffy) > $> library(affydata) > $> require(gcrma) > $ > $# preliminary commands > $> setwd ("....") > $> x <- read.affy() > $> x at cdfName <- "HGU133Plus2" > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE) > $> cdfpackagename <- paste(cleancdf,"cdf",sep="") > $> probepackagename <- paste(cleancdf,"probe",sep="") > $> ResetEnvir(cdfpackagename,probepackagename) > $ > $# my function for indentification of unwanted oligos > $# I will append the function code below > $> rmd <- get.foul.oligos(x, ratio=2, diff=100) > $ > $# the initial rma/mas calculation without changes > $> d.rma <- rma(x) > $> d.gcrma <- gcrma(x) > $> d.mas <- mas5(x) > $> d.mas.call <- mas5calls(x) > $ > $# The command for invoking the removeProbes function > $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename) > > > The probe list "rmd$Probes" contains 565825 entries, the set list "rmd$Set" > 32791 entries. A little structure check: > > $> rmd$Set[1:5] > $[1] "1007_s_at" "1053_at" "1255_g_at" "1405_i_at" "1438_at" > $> rmd$Probes[1:5] > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" "1007_s_at5" > > Looks fine to me, but is it? What could cause the error message here ? > > Here comes the code of my function: > > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) { > > remove <- list() > remove$Probes <- list() > remove$Set <- list() > len.plist <- 0 > len.slist <- 0 > > > ids <- geneNames(x) > num.ids <- length(ids) > num.samples <- length(pm(x)[1,]) > msk.table <- matrix(NA,num.ids,2) > rm.string <- NA > > for (i in 1:num.ids) { > id <- ids[i] > oligos.pm <- indexProbes(x, which="pm", genenames=id)[[id]] > oligos.mm <- indexProbes(x, which="mm", genenames=id)[[id]] > num.oligos <- lengtholigos.pm) > num.rm <- 0 > for (j in 1:num.oligos) { > cat ("i:",i," j:",j,"\n") > rm.flag <- TRUE > for (k in 1:num.samples) { > val.pm <- intensity(x)[oligos.pm[j],k] > val.mm <- intensity(x)[oligos.mm[j],k] > if val.pm/val.mm > ratio & val.pm- val.mm > diff) { > rm.flag <- FALSE > } > } > if (rm.flag == TRUE) { > num.rm <- num.rm + 1 > oligo.name <- paste(id,j,sep="") > if (len.plist == 1) { > remove$Probes <- c(remove$Probes,oligo.name) > } > else { remove$Probes <- coligo.name) } > len.plist <- 1 > > if is.na(rm.string)) { rm.string <- j } > else { rm.string <- paste(rm.string,j,sep=",") } > } > } > if (num.rm == num.oligos) { > if (len.slist == 1) { > remove$Set <- c(remove$Set,id) > } > else { remove$Set <- c(id) } > len.slist <- 1 > > rm.string <- "all" > } > } > return(remove) > } > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin > > Otto > > Sent: 05 December 2005 17:14 > > To: bioconductor at stat.math.ethz.ch > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > Hi Ariel and Jenny, > > > > thanks very much for the quick reply!!! I have been experimenting > > since with > > the functions and it really seems to be what I'm looking for. A small > > test with your test_script Ariel returned what I hoped for. However > > there is one > > aspect confusing me. After writing a little function myself, > > which computes > > the probe and probeset list which should be excluded in my case I applied > > the removeProbes function on these lists. While I'm writing this > > message my > > computer is still calculating and he has started four hours ago! Is it > > normal for the function to be so time consuming or is there some > > chance that > > I just have done something wrong? I suppose the problem is on my > > side but I > > just can't imagine what it could be because I followed the "testscript" > > steps. But as the reading of the CEL files usually needs just a > > few minutes > > (maybe five if time consuming) I wouldn't have thought the removing of > > probesets would take as long as it currently does. I must confess my > > lists are really long. Still the problem is of big importance for me and > > so it would be great to get an idea of your experience with the function > > perfomance on your systems. Here is a little summary of the > > conditions in my > > case: > > > > Chiptype: HG_U133_Plus2 > > num. samples: 2 > > num. Probes in listOutProbes: ~ 500.000 > > num. Sets in listOutProbeSets: ~ 30.000 > > System: P4 - 3 GHZ, 500 MB RAM, > > R-version: RGUI 2.2.0 under WindowsXP. > > > > Mainly I have been wondering that the removing of probes from the > > structure > > takes so much longer than reading in the CEL files and building the > > structure. Has anyone of you maybe started a test before trying to remove > > all probesets from an environment? Or can you at least judge wether the > > current running time in my case is ok or looks suspicious? > > > > In any case just to make it clear again, thanks very much in any case. > > The functions really seem to be of very, very much help!!! :-) > > > > Sincerly, > > Benjamin > > > > > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > Chernomoretz > > Sent: 01 December 2005 17:32 > > To: bioconductor at stat.math.ethz.ch > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > Hi Benjamin, > > > > Some time ago I wrote a couple of functions that modified > > the cdf environments in order to remove bad probes from the affy analysis > > > > You could check: > > > > http://files.protsuggest.org/biocond/html/7350.html > > http://files.protsuggest.org/biocond/html/7366.html > > http://files.protsuggest.org/biocond/html/7367.html > > > > > > hope that helps. > > > > ariel./ > > > > On December 1, 2005 11:06 am, Benjamin Otto wrote: > > > Hi, > > > > > > I have been searching for nearly two days for a solution to the > > > > following > > > > > problem without finding satisfactory answers: > > > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for > > > certain probesets single Oligos such that the expression, p and fold > > > > change > > > > > values are calculated based on the remaining oligos? > > > > > > A better description of my problem and the background. We are handling > > > a cross-species experiment having hybradized rna from tupaia on the > > > human chip. This resulted in fairly low expression signals. If we just > > > forget about all the other putative problems in analysing the result I > > > think it seems reasonable to say, that in many cases only some of the > > > probeset oligos will have hybridized satisfyingly. So the idea is > > > masking some of the oligos by some criteria and calculate the results > > > only based upon subsets of the probesets. The problem is: If I set even > > > only one single oligo to NA, the values calculated for the > > > corresponding > > > > probeset won't be > > > > > calculated but set to NA. Most of the threads I found concerning the > > > masking problem handle the question of an autpmated or corrected form > > > of masking. But there seems to be no available information about our > > > case. > > > > Has > > > > > anyone done something like that before? I'm sure there will have to be > > > > some > > > > > manual programing. But the major question is: Does anybody see a > > > possibility to mask the single oligos on a top level like fixing the > > > affybatch structure? Or do I have to change every single > > > > function to treat > > > > > NA values in the correct form? > > > > > > thanks for your help, > > > Benjamin > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > -- > > Ariel Chernomoretz, Ph.D. > > Centre de recherche du CHUL > > 2705 Blv Laurier, bloc T-367 > > Sainte-Foy, Qc > > G1V 4G2 > > (418)-525-4444 ext 46339 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Ariel Chernomoretz, Ph.D. Centre de recherche du CHUL 2705 Blv Laurier, bloc T-367 Sainte-Foy, Qc G1V 4G2 (418)-525-4444 ext 46339

ADD REPLY • link 19.0 years ago Ariel Chernomoretz ▴ 380

0

Entering edit mode

Hi Ariel, well I wish you a loooot of luck then for the pre-postdoc time. I hope you like your new job. :-) I think I have localized the problem for the function crash although I don't quite understand it yet. I was just writing this text when your answer reached me. Thanks! Back to the crash: I will just give two examples for better explanation. #Example 1: Include all probes of the probeset 117_at # with exception of the first probe # and don't delete the set itself $> listOutProbes <- c("117_at2","117_at3","117_at4", $+ "117_at5","117_at6","117_at7","117_at8","117_at9", $+ "117_at10","117_at11","117_at12","117_at13", $+ "117_at14","117_at15","117_at16") $> sets <- NULL $> ResetEnvir(cdfpackagename,probepackagename) $> RemoveProbes(probes,sets,cdfpackagename,probepackagename) #Example 2: Exapmle 1 again but ... # ... delete the set from the list $> listOutProbes <- c("117_at2","117_at3","117_at4", $+ "117_at5","117_at6","117_at7","117_at8","117_at9", $+ "117_at10","117_at11","117_at12","117_at13", $+ "117_at14","117_at15","117_at16") $> sets <- ("117_at") $> .. $> .. ONLY the construction in example number 1!!! returns the error message: $ Error in ans[[i]][, i.probes] : incorrect number of dimensions and that when it reaches the command: $ pmIndex <- unlist(indexProbes(newAB,"pm")) in function "RemoveProbes". Note that the command "indexProbes(newAB,"pm")" is causing the problem. That's why I snooped into the code of the method "indexProbes" and voila ... the problem source is finally the command: $ tmp <- as.vector(ans[[i]][, i.probes]) where "ans" was created by $> envir <- getCdfInfo(newAB) $> genenames <- ls(envir) $> ans <- mget(genenames, envir, ifnotfound = NA) NOW: If ans[[i]] contains only ONE value for each of "pm" and "mm" then $> ans[[i]][,"pm"], $> ans[[i]][,"mm"] or ... $> ans[[i]][,c(1,2)] causes R to return the error message. I don't know whether the following observation has to do with that or is rather neglectable: $ # for the probeset with only one probe left I get: $ Browse[1]> ans[[i]][1] $ pm $1051223 $ # for a probeset with more probes left I get: $ Browse[1]> ans[[i]][1] $ [1] 1051223 FINALLY: Sounds as if the bells should ring in ecstasy in my head telling me what the problem really is, unfortunately I don't understand it yet. Maybe someone has an idea whether this is some kind of expected problem which could be corrected by the way "ans" is created. I think I will have to experiment a little bit more... regards, Benjamin > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > Chernomoretz > Sent: 06 December 2005 17:10 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > > > Hi Benjamin > > I am sorry you have problems with the code. > > Regarding the computation time, yes. It is slow even with small/mid sized > lists, so in your case I would expect a loooot of processing time > (more RAM > would help also ;-) ). I guess that to improve this, the > function should be > completely rewritten using perhaps a different approach. > > However, once you succed in generating a modified cdf > environment, and if you > are planning to compute expressions many times having the same probes > removed, you should keep (save) the modified cdf environment as > an R object. > I wrote a function (RemoveProbes2, included at the end) that takes the > clean.cdf.environment as an input argument. So everything should > be faster > for subsequent calculations. > > > Unfortunately I am not a lot at the office these days. My postdoc > is about to > end here, and I am moving in a couple of weeks, so things are a > little bit > caotic this days! However I will check your code asap. > > quick observations/questions: > 1) are you removing entire probesets specifiyng the whole set of > probes via > the rmd$Probes list AND the rmd$Set? > 2) if you plan to use presence calls, be cautious. After the > probe removal > step you end up with probeset with different number of probes so > be carefull > with the significance levels of the Wilcoxon'rank test > > > > Regards, > Ariel./ > > > > > ################################################ > # clean.cdf: a modified cdf environment. Should be generated > # using RemoveProbes and saving the modified > # environment afterwards > # cdfpackagename: (e.g. "hgu95av2cdf") > # probepackagename: (e.g. "hgu95av2probe") > # destructive: unimplemented option, see NOTE > RemoveProbes2<-function(clean.cdfenv =NULL, > cdfpackagename,probepackagename,destructive=TRUE){ > > #default probe dataset values > probe.env.orig <- get(probepackagename) > > > > if(is.null(clean.cdfenv)){ > cat("A cdfenv should be provided\n")\ > return(); > }else{ #clean.cdf provided. much, much faster! > ipos<-grep(cdfpackagename,search()) > rm(list=c(cdfpackagename),pos=ipos) > assign(cdfpackagename,clean.cdfenv,pos=ipos) > } > > > # setting the PROBE env accordingly (idea from gcrma > compute.affinities.R) > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > newAB <- new("AffyBatch",cdfName=cleancdf) > pmIndex <- unlist(indexProbes(newAB,"pm")) > subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) > rm(newAB) > iNA <- whichis.na(subIndex)) > > if(length(iNA)>0){ > ipos<-grep(probepackagename,search()) > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > } > } > > ############################################### > > > > On December 6, 2005 04:07 am, Benjamin Otto wrote: > > Hi folks, > > > > sorry for taking your time again. I let the computer finish the > > "removeProbes"-calculation over night, so I don't know how long he > > took...to finally display the following error message: > > > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions" > > > > You can imagine how desperately I was looking for a shotgun > that moment. To > > provide maybe more information I forgot to submit in my last thread here > > come the commands and extracts of my lists / structures I used: > > > > $# libraries used > > $> library(simpleaffy) > > $> library(affydata) > > $> require(gcrma) > > $ > > $# preliminary commands > > $> setwd ("....") > > $> x <- read.affy() > > $> x at cdfName <- "HGU133Plus2" > > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE) > > $> cdfpackagename <- paste(cleancdf,"cdf",sep="") > > $> probepackagename <- paste(cleancdf,"probe",sep="") > > $> ResetEnvir(cdfpackagename,probepackagename) > > $ > > $# my function for indentification of unwanted oligos > > $# I will append the function code below > > $> rmd <- get.foul.oligos(x, ratio=2, diff=100) > > $ > > $# the initial rma/mas calculation without changes > > $> d.rma <- rma(x) > > $> d.gcrma <- gcrma(x) > > $> d.mas <- mas5(x) > > $> d.mas.call <- mas5calls(x) > > $ > > $# The command for invoking the removeProbes function > > $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename) > > > > > > The probe list "rmd$Probes" contains 565825 entries, the set > list "rmd$Set" > > 32791 entries. A little structure check: > > > > $> rmd$Set[1:5] > > $[1] "1007_s_at" "1053_at" "1255_g_at" "1405_i_at" "1438_at" > > $> rmd$Probes[1:5] > > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" "1007_s_at5" > > > > Looks fine to me, but is it? What could cause the error message here ? > > > > Here comes the code of my function: > > > > > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) { > > > > remove <- list() > > remove$Probes <- list() > > remove$Set <- list() > > len.plist <- 0 > > len.slist <- 0 > > > > > > ids <- geneNames(x) > > num.ids <- length(ids) > > num.samples <- length(pm(x)[1,]) > > msk.table <- matrix(NA,num.ids,2) > > rm.string <- NA > > > > for (i in 1:num.ids) { > > id <- ids[i] > > oligos.pm <- indexProbes(x, which="pm", genenames=id)[[id]] > > oligos.mm <- indexProbes(x, which="mm", genenames=id)[[id]] > > num.oligos <- lengtholigos.pm) > > num.rm <- 0 > > for (j in 1:num.oligos) { > > cat ("i:",i," j:",j,"\n") > > rm.flag <- TRUE > > for (k in 1:num.samples) { > > val.pm <- intensity(x)[oligos.pm[j],k] > > val.mm <- intensity(x)[oligos.mm[j],k] > > if val.pm/val.mm > ratio & > val.pm-val.mm > diff) { > > rm.flag <- FALSE > > } > > } > > if (rm.flag == TRUE) { > > num.rm <- num.rm + 1 > > oligo.name <- paste(id,j,sep="") > > if (len.plist == 1) { > > remove$Probes <- > c(remove$Probes,oligo.name) > > } > > else { remove$Probes <- coligo.name) } > > len.plist <- 1 > > > > if is.na(rm.string)) { rm.string <- j } > > else { rm.string <- > paste(rm.string,j,sep=",") } > > } > > } > > if (num.rm == num.oligos) { > > if (len.slist == 1) { > > remove$Set <- c(remove$Set,id) > > } > > else { remove$Set <- c(id) } > > len.slist <- 1 > > > > rm.string <- "all" > > } > > } > > return(remove) > > } > > > > > -----Original Message----- > > > From: bioconductor-bounces at stat.math.ethz.ch > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin > > > Otto > > > Sent: 05 December 2005 17:14 > > > To: bioconductor at stat.math.ethz.ch > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > Hi Ariel and Jenny, > > > > > > thanks very much for the quick reply!!! I have been experimenting > > > since with > > > the functions and it really seems to be what I'm looking for. A small > > > test with your test_script Ariel returned what I hoped for. However > > > there is one > > > aspect confusing me. After writing a little function myself, > > > which computes > > > the probe and probeset list which should be excluded in my > case I applied > > > the removeProbes function on these lists. While I'm writing this > > > message my > > > computer is still calculating and he has started four hours ago! Is it > > > normal for the function to be so time consuming or is there some > > > chance that > > > I just have done something wrong? I suppose the problem is on my > > > side but I > > > just can't imagine what it could be because I followed the > "testscript" > > > steps. But as the reading of the CEL files usually needs just a > > > few minutes > > > (maybe five if time consuming) I wouldn't have thought the removing of > > > probesets would take as long as it currently does. I must confess my > > > lists are really long. Still the problem is of big importance > for me and > > > so it would be great to get an idea of your experience with > the function > > > perfomance on your systems. Here is a little summary of the > > > conditions in my > > > case: > > > > > > Chiptype: HG_U133_Plus2 > > > num. samples: 2 > > > num. Probes in listOutProbes: ~ 500.000 > > > num. Sets in listOutProbeSets: ~ 30.000 > > > System: P4 - 3 GHZ, 500 MB RAM, > > > R-version: RGUI 2.2.0 under WindowsXP. > > > > > > Mainly I have been wondering that the removing of probes from the > > > structure > > > takes so much longer than reading in the CEL files and building the > > > structure. Has anyone of you maybe started a test before > trying to remove > > > all probesets from an environment? Or can you at least judge > wether the > > > current running time in my case is ok or looks suspicious? > > > > > > In any case just to make it clear again, thanks very much in any case. > > > The functions really seem to be of very, very much help!!! :-) > > > > > > Sincerly, > > > Benjamin > > > > > > > > > > > > -----Original Message----- > > > From: bioconductor-bounces at stat.math.ethz.ch > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > > Chernomoretz > > > Sent: 01 December 2005 17:32 > > > To: bioconductor at stat.math.ethz.ch > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > Hi Benjamin, > > > > > > Some time ago I wrote a couple of functions that modified > > > the cdf environments in order to remove bad probes from the > affy analysis > > > > > > You could check: > > > > > > http://files.protsuggest.org/biocond/html/7350.html > > > http://files.protsuggest.org/biocond/html/7366.html > > > http://files.protsuggest.org/biocond/html/7367.html > > > > > > > > > hope that helps. > > > > > > ariel./ > > > > > > On December 1, 2005 11:06 am, Benjamin Otto wrote: > > > > Hi, > > > > > > > > I have been searching for nearly two days for a solution to the > > > > > > following > > > > > > > problem without finding satisfactory answers: > > > > > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask for > > > > certain probesets single Oligos such that the expression, p and fold > > > > > > change > > > > > > > values are calculated based on the remaining oligos? > > > > > > > > A better description of my problem and the background. We > are handling > > > > a cross-species experiment having hybradized rna from tupaia on the > > > > human chip. This resulted in fairly low expression signals. > If we just > > > > forget about all the other putative problems in analysing > the result I > > > > think it seems reasonable to say, that in many cases only > some of the > > > > probeset oligos will have hybridized satisfyingly. So the idea is > > > > masking some of the oligos by some criteria and calculate > the results > > > > only based upon subsets of the probesets. The problem is: > If I set even > > > > only one single oligo to NA, the values calculated for the > > > > corresponding > > > > > > probeset won't be > > > > > > > calculated but set to NA. Most of the threads I found concerning the > > > > masking problem handle the question of an autpmated or > corrected form > > > > of masking. But there seems to be no available information about our > > > > case. > > > > > > Has > > > > > > > anyone done something like that before? I'm sure there will > have to be > > > > > > some > > > > > > > manual programing. But the major question is: Does anybody see a > > > > possibility to mask the single oligos on a top level like fixing the > > > > affybatch structure? Or do I have to change every single > > > > > > function to treat > > > > > > > NA values in the correct form? > > > > > > > > thanks for your help, > > > > Benjamin > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > -- > > > Ariel Chernomoretz, Ph.D. > > > Centre de recherche du CHUL > > > 2705 Blv Laurier, bloc T-367 > > > Sainte-Foy, Qc > > > G1V 4G2 > > > (418)-525-4444 ext 46339 > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- > Ariel Chernomoretz, Ph.D. > Centre de recherche du CHUL > 2705 Blv Laurier, bloc T-367 > Sainte-Foy, Qc > G1V 4G2 > (418)-525-4444 ext 46339 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.0 years ago Benjamin Otto ▴ 830

0

Entering edit mode

Hi Benjamin mmm...try replacing the line : $ tmp <- as.vector(ans[[i]][, i.probes]) with $ tmp <- as.vector(as.matrix(ans[[i]])[, i.probes]) here is why i think that may work... > l<-list(1,as.matrix(1)) > l[[1]][,1] Error in l[[1]][, 1] : incorrect number of dimensions > l[[2]][,1] [1] 1 > as.matrix(l[[1]])[,1] [1] 1 regards ariel./ On December 6, 2005 12:54 pm, Benjamin Otto wrote: > Hi Ariel, > > well I wish you a loooot of luck then for the pre-postdoc time. I hope you > like your new job. :-) > > I think I have localized the problem for the function crash although I > don't quite understand it yet. I was just writing this text when your > answer reached me. Thanks! > Back to the crash: I will just give two examples for better explanation. > > #Example 1: Include all probes of the probeset 117_at > # with exception of the first probe > # and don't delete the set itself > $> listOutProbes <- c("117_at2","117_at3","117_at4", > $+ "117_at5","117_at6","117_at7","117_at8","117_at9", > $+ "117_at10","117_at11","117_at12","117_at13", > $+ "117_at14","117_at15","117_at16") > > $> sets <- NULL > $> ResetEnvir(cdfpackagename,probepackagename) > $> RemoveProbes(probes,sets,cdfpackagename,probepackagename) > > > #Example 2: Exapmle 1 again but ... > # ... delete the set from the list > $> listOutProbes <- c("117_at2","117_at3","117_at4", > $+ "117_at5","117_at6","117_at7","117_at8","117_at9", > $+ "117_at10","117_at11","117_at12","117_at13", > $+ "117_at14","117_at15","117_at16") > > $> sets <- ("117_at") > $> .. > $> .. > > > ONLY the construction in example number 1!!! returns the error message: > > $ Error in ans[[i]][, i.probes] : incorrect number of dimensions > > and that when it reaches the command: > > $ pmIndex <- unlist(indexProbes(newAB,"pm")) > > in function "RemoveProbes". Note that the command "indexProbes(newAB,"pm")" > is causing the problem. That's why I snooped into the code of the method > "indexProbes" and voila ... the problem source is finally the command: > > $ tmp <- as.vector(ans[[i]][, i.probes]) > > where "ans" was created by > > $> envir <- getCdfInfo(newAB) > $> genenames <- ls(envir) > $> ans <- mget(genenames, envir, ifnotfound = NA) > > > NOW: If ans[[i]] contains only ONE value for each of "pm" and "mm" then > > $> ans[[i]][,"pm"], > $> ans[[i]][,"mm"] > or ... > $> ans[[i]][,c(1,2)] > > causes R to return the error message. I don't know whether the following > observation has to do with that or is rather neglectable: > > $ # for the probeset with only one probe left I get: > $ Browse[1]> ans[[i]][1] > $ pm > $1051223 > > $ # for a probeset with more probes left I get: > $ Browse[1]> ans[[i]][1] > $ [1] 1051223 > > > FINALLY: Sounds as if the bells should ring in ecstasy in my head telling > me what the problem really is, unfortunately I don't understand it yet. > Maybe someone has an idea whether this is some kind of expected problem > which could be corrected by the way "ans" is created. I think I will have > to experiment a little bit more... > > regards, > Benjamin > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > Chernomoretz > > Sent: 06 December 2005 17:10 > > To: bioconductor at stat.math.ethz.ch > > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > > > > > > > > Hi Benjamin > > > > I am sorry you have problems with the code. > > > > Regarding the computation time, yes. It is slow even with small/mid sized > > lists, so in your case I would expect a loooot of processing time > > (more RAM > > would help also ;-) ). I guess that to improve this, the > > function should be > > completely rewritten using perhaps a different approach. > > > > However, once you succed in generating a modified cdf > > environment, and if you > > are planning to compute expressions many times having the same probes > > removed, you should keep (save) the modified cdf environment as > > an R object. > > I wrote a function (RemoveProbes2, included at the end) that takes the > > clean.cdf.environment as an input argument. So everything should > > be faster > > for subsequent calculations. > > > > > > Unfortunately I am not a lot at the office these days. My postdoc > > is about to > > end here, and I am moving in a couple of weeks, so things are a > > little bit > > caotic this days! However I will check your code asap. > > > > quick observations/questions: > > 1) are you removing entire probesets specifiyng the whole set of > > probes via > > the rmd$Probes list AND the rmd$Set? > > 2) if you plan to use presence calls, be cautious. After the > > probe removal > > step you end up with probeset with different number of probes so > > be carefull > > with the significance levels of the Wilcoxon'rank test > > > > > > > > Regards, > > Ariel./ > > > > > > > > > > ################################################ > > # clean.cdf: a modified cdf environment. Should be generated > > # using RemoveProbes and saving the modified > > # environment afterwards > > # cdfpackagename: (e.g. "hgu95av2cdf") > > # probepackagename: (e.g. "hgu95av2probe") > > # destructive: unimplemented option, see NOTE > > RemoveProbes2<-function(clean.cdfenv =NULL, > > > > cdfpackagename,probepackagename,destructive=TRUE){ > > > > #default probe dataset values > > probe.env.orig <- get(probepackagename) > > > > > > > > if(is.null(clean.cdfenv)){ > > cat("A cdfenv should be provided\n")\ > > return(); > > }else{ #clean.cdf provided. much, much faster! > > ipos<-grep(cdfpackagename,search()) > > rm(list=c(cdfpackagename),pos=ipos) > > assign(cdfpackagename,clean.cdfenv,pos=ipos) > > } > > > > > > # setting the PROBE env accordingly (idea from gcrma > > compute.affinities.R) > > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > > newAB <- new("AffyBatch",cdfName=cleancdf) > > pmIndex <- unlist(indexProbes(newAB,"pm")) > > subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) > > rm(newAB) > > iNA <- whichis.na(subIndex)) > > > > if(length(iNA)>0){ > > ipos<-grep(probepackagename,search()) > > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > > } > > } > > > > ############################################### > > > > On December 6, 2005 04:07 am, Benjamin Otto wrote: > > > Hi folks, > > > > > > sorry for taking your time again. I let the computer finish the > > > "removeProbes"-calculation over night, so I don't know how long he > > > took...to finally display the following error message: > > > > > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions" > > > > > > You can imagine how desperately I was looking for a shotgun > > > > that moment. To > > > > > provide maybe more information I forgot to submit in my last thread > > > here come the commands and extracts of my lists / structures I used: > > > > > > $# libraries used > > > $> library(simpleaffy) > > > $> library(affydata) > > > $> require(gcrma) > > > $ > > > $# preliminary commands > > > $> setwd ("....") > > > $> x <- read.affy() > > > $> x at cdfName <- "HGU133Plus2" > > > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE) > > > $> cdfpackagename <- paste(cleancdf,"cdf",sep="") > > > $> probepackagename <- paste(cleancdf,"probe",sep="") > > > $> ResetEnvir(cdfpackagename,probepackagename) > > > $ > > > $# my function for indentification of unwanted oligos > > > $# I will append the function code below > > > $> rmd <- get.foul.oligos(x, ratio=2, diff=100) > > > $ > > > $# the initial rma/mas calculation without changes > > > $> d.rma <- rma(x) > > > $> d.gcrma <- gcrma(x) > > > $> d.mas <- mas5(x) > > > $> d.mas.call <- mas5calls(x) > > > $ > > > $# The command for invoking the removeProbes function > > > $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename) > > > > > > > > > The probe list "rmd$Probes" contains 565825 entries, the set > > > > list "rmd$Set" > > > > > 32791 entries. A little structure check: > > > > > > $> rmd$Set[1:5] > > > $[1] "1007_s_at" "1053_at" "1255_g_at" "1405_i_at" "1438_at" > > > $> rmd$Probes[1:5] > > > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" "1007_s_at5" > > > > > > Looks fine to me, but is it? What could cause the error message here ? > > > > > > Here comes the code of my function: > > > > > > > > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) { > > > > > > remove <- list() > > > remove$Probes <- list() > > > remove$Set <- list() > > > len.plist <- 0 > > > len.slist <- 0 > > > > > > > > > ids <- geneNames(x) > > > num.ids <- length(ids) > > > num.samples <- length(pm(x)[1,]) > > > msk.table <- matrix(NA,num.ids,2) > > > rm.string <- NA > > > > > > for (i in 1:num.ids) { > > > id <- ids[i] > > > oligos.pm <- indexProbes(x, which="pm", genenames=id)[[id]] > > > oligos.mm <- indexProbes(x, which="mm", genenames=id)[[id]] > > > num.oligos <- lengtholigos.pm) > > > num.rm <- 0 > > > for (j in 1:num.oligos) { > > > cat ("i:",i," j:",j,"\n") > > > rm.flag <- TRUE > > > for (k in 1:num.samples) { > > > val.pm <- intensity(x)[oligos.pm[j],k] > > > val.mm <- intensity(x)[oligos.mm[j],k] > > > if val.pm/val.mm > ratio & > > > > val.pm-val.mm > diff) { > > > > > rm.flag <- FALSE > > > } > > > } > > > if (rm.flag == TRUE) { > > > num.rm <- num.rm + 1 > > > oligo.name <- paste(id,j,sep="") > > > if (len.plist == 1) { > > > remove$Probes <- > > > > c(remove$Probes,oligo.name) > > > > > } > > > else { remove$Probes <- coligo.name) } > > > len.plist <- 1 > > > > > > if is.na(rm.string)) { rm.string <- j } > > > else { rm.string <- > > > > paste(rm.string,j,sep=",") } > > > > > } > > > } > > > if (num.rm == num.oligos) { > > > if (len.slist == 1) { > > > remove$Set <- c(remove$Set,id) > > > } > > > else { remove$Set <- c(id) } > > > len.slist <- 1 > > > > > > rm.string <- "all" > > > } > > > } > > > return(remove) > > > } > > > > > > > -----Original Message----- > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin > > > > Otto > > > > Sent: 05 December 2005 17:14 > > > > To: bioconductor at stat.math.ethz.ch > > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > > > > Hi Ariel and Jenny, > > > > > > > > thanks very much for the quick reply!!! I have been experimenting > > > > since with > > > > the functions and it really seems to be what I'm looking for. A small > > > > test with your test_script Ariel returned what I hoped for. However > > > > there is one > > > > aspect confusing me. After writing a little function myself, > > > > which computes > > > > the probe and probeset list which should be excluded in my > > > > case I applied > > > > > > the removeProbes function on these lists. While I'm writing this > > > > message my > > > > computer is still calculating and he has started four hours ago! Is > > > > it normal for the function to be so time consuming or is there some > > > > chance that > > > > I just have done something wrong? I suppose the problem is on my > > > > side but I > > > > just can't imagine what it could be because I followed the > > > > "testscript" > > > > > > steps. But as the reading of the CEL files usually needs just a > > > > few minutes > > > > (maybe five if time consuming) I wouldn't have thought the removing > > > > of probesets would take as long as it currently does. I must confess > > > > my lists are really long. Still the problem is of big importance > > > > for me and > > > > > > so it would be great to get an idea of your experience with > > > > the function > > > > > > perfomance on your systems. Here is a little summary of the > > > > conditions in my > > > > case: > > > > > > > > Chiptype: HG_U133_Plus2 > > > > num. samples: 2 > > > > num. Probes in listOutProbes: ~ 500.000 > > > > num. Sets in listOutProbeSets: ~ 30.000 > > > > System: P4 - 3 GHZ, 500 MB RAM, > > > > R-version: RGUI 2.2.0 under WindowsXP. > > > > > > > > Mainly I have been wondering that the removing of probes from the > > > > structure > > > > takes so much longer than reading in the CEL files and building the > > > > structure. Has anyone of you maybe started a test before > > > > trying to remove > > > > > > all probesets from an environment? Or can you at least judge > > > > wether the > > > > > > current running time in my case is ok or looks suspicious? > > > > > > > > In any case just to make it clear again, thanks very much in any > > > > case. The functions really seem to be of very, very much help!!! :-) > > > > > > > > Sincerly, > > > > Benjamin > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > > > Chernomoretz > > > > Sent: 01 December 2005 17:32 > > > > To: bioconductor at stat.math.ethz.ch > > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > > > > Hi Benjamin, > > > > > > > > Some time ago I wrote a couple of functions that modified > > > > the cdf environments in order to remove bad probes from the > > > > affy analysis > > > > > > You could check: > > > > > > > > http://files.protsuggest.org/biocond/html/7350.html > > > > http://files.protsuggest.org/biocond/html/7366.html > > > > http://files.protsuggest.org/biocond/html/7367.html > > > > > > > > > > > > hope that helps. > > > > > > > > ariel./ > > > > > > > > On December 1, 2005 11:06 am, Benjamin Otto wrote: > > > > > Hi, > > > > > > > > > > I have been searching for nearly two days for a solution to the > > > > > > > > following > > > > > > > > > problem without finding satisfactory answers: > > > > > > > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask > > > > > for certain probesets single Oligos such that the expression, p and > > > > > fold > > > > > > > > change > > > > > > > > > values are calculated based on the remaining oligos? > > > > > > > > > > A better description of my problem and the background. We > > > > are handling > > > > > > > a cross-species experiment having hybradized rna from tupaia on the > > > > > human chip. This resulted in fairly low expression signals. > > > > If we just > > > > > > > forget about all the other putative problems in analysing > > > > the result I > > > > > > > think it seems reasonable to say, that in many cases only > > > > some of the > > > > > > > probeset oligos will have hybridized satisfyingly. So the idea is > > > > > masking some of the oligos by some criteria and calculate > > > > the results > > > > > > > only based upon subsets of the probesets. The problem is: > > > > If I set even > > > > > > > only one single oligo to NA, the values calculated for the > > > > > corresponding > > > > > > > > probeset won't be > > > > > > > > > calculated but set to NA. Most of the threads I found concerning > > > > > the masking problem handle the question of an autpmated or > > > > corrected form > > > > > > > of masking. But there seems to be no available information about > > > > > our case. > > > > > > > > Has > > > > > > > > > anyone done something like that before? I'm sure there will > > > > have to be > > > > > > some > > > > > > > > > manual programing. But the major question is: Does anybody see a > > > > > possibility to mask the single oligos on a top level like fixing > > > > > the affybatch structure? Or do I have to change every single > > > > > > > > function to treat > > > > > > > > > NA values in the correct form? > > > > > > > > > > thanks for your help, > > > > > Benjamin > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at stat.math.ethz.ch > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > -- > > > > Ariel Chernomoretz, Ph.D. > > > > Centre de recherche du CHUL > > > > 2705 Blv Laurier, bloc T-367 > > > > Sainte-Foy, Qc > > > > G1V 4G2 > > > > (418)-525-4444 ext 46339 > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > -- > > Ariel Chernomoretz, Ph.D. > > Centre de recherche du CHUL > > 2705 Blv Laurier, bloc T-367 > > Sainte-Foy, Qc > > G1V 4G2 > > (418)-525-4444 ext 46339 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Ariel Chernomoretz, Ph.D. Centre de recherche du CHUL 2705 Blv Laurier, bloc T-367 Sainte-Foy, Qc G1V 4G2 (418)-525-4444 ext 46339

ADD REPLY • link 19.0 years ago Ariel Chernomoretz ▴ 380

0

Entering edit mode

Hi Ariel, I have been experimenting with the new function "RemoveProbes2" unfortunately without success. However that might be due to the fact that I'm not sure how to save the environment. For the saving steps I followed the vignette of Laurent Gautier under "www.bioconductor.org/repository/devel/vignette/ngenomeschips.pdf". To be precise I used the following commands > cdfpackagename [1] "hgu133plus2cdf" > probepackagename [1] "hgu133plus2probe" > plcdf <- wrapCdfEnvAffy(hgu133plus2cdf,1164,1164,"hgu133plus2cdf") > envplcdf <- as(plcdf,"environment") > x at cdfName <- envplcdf > save(x,plcdf,envplcdf,file="HGU133P2x200.rda") > which seemed to work fine, but I can't tell for sure. Now when I called the new function it pretends the file doesn't contain any cdf information !? > cdfpackagename [1] "hgu133plus2cdf" > probepackagename [1] "hgu133plus2probe" > RestoreCdfEnv(clean.cdf="HGU133P2x200.rda",cdfpackagename,probepackage name) Warning: package 'hgu133plus2cdf' is in use and will not be installed Error in getCdfInfo(object) : Could not obtain CDF environment, problems encountered: Specified environment does not contain hgu133plus2 HGU133P2x200.rda Data for package affy did not contain hgu133plus2cdf HGU133P2x200.rda I'm aware that most probably the environment doesn't get stored correctly in the file at the first place. Yet, I didn't find any convinient alternative for doing so. Regards, Benjamin P.S.: And a merry 1st, 2nd and 3rd Advent > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > Chernomoretz > Sent: 06 December 2005 19:55 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > Hi Benjamin > > mmm...try replacing > the line : > > $ tmp <- as.vector(ans[[i]][, i.probes]) > > with > > $ tmp <- as.vector(as.matrix(ans[[i]])[, i.probes]) > > > > > here is why i think that may work... > > > l<-list(1,as.matrix(1)) > > l[[1]][,1] > Error in l[[1]][, 1] : incorrect number of dimensions > > l[[2]][,1] > [1] 1 > > as.matrix(l[[1]])[,1] > [1] 1 > > > regards > ariel./ > > > > On December 6, 2005 12:54 pm, Benjamin Otto wrote: > > Hi Ariel, > > > > well I wish you a loooot of luck then for the pre-postdoc time. > I hope you > > like your new job. :-) > > > > I think I have localized the problem for the function crash although I > > don't quite understand it yet. I was just writing this text when your > > answer reached me. Thanks! > > Back to the crash: I will just give two examples for better explanation. > > > > #Example 1: Include all probes of the probeset 117_at > > # with exception of the first probe > > # and don't delete the set itself > > $> listOutProbes <- c("117_at2","117_at3","117_at4", > > $+ "117_at5","117_at6","117_at7","117_at8","117_at9", > > $+ "117_at10","117_at11","117_at12","117_at13", > > $+ "117_at14","117_at15","117_at16") > > > > $> sets <- NULL > > $> ResetEnvir(cdfpackagename,probepackagename) > > $> RemoveProbes(probes,sets,cdfpackagename,probepackagename) > > > > > > #Example 2: Exapmle 1 again but ... > > # ... delete the set from the list > > $> listOutProbes <- c("117_at2","117_at3","117_at4", > > $+ "117_at5","117_at6","117_at7","117_at8","117_at9", > > $+ "117_at10","117_at11","117_at12","117_at13", > > $+ "117_at14","117_at15","117_at16") > > > > $> sets <- ("117_at") > > $> .. > > $> .. > > > > > > ONLY the construction in example number 1!!! returns the error message: > > > > $ Error in ans[[i]][, i.probes] : incorrect number of dimensions > > > > and that when it reaches the command: > > > > $ pmIndex <- unlist(indexProbes(newAB,"pm")) > > > > in function "RemoveProbes". Note that the command > "indexProbes(newAB,"pm")" > > is causing the problem. That's why I snooped into the code of the method > > "indexProbes" and voila ... the problem source is finally the command: > > > > $ tmp <- as.vector(ans[[i]][, i.probes]) > > > > where "ans" was created by > > > > $> envir <- getCdfInfo(newAB) > > $> genenames <- ls(envir) > > $> ans <- mget(genenames, envir, ifnotfound = NA) > > > > > > NOW: If ans[[i]] contains only ONE value for each of "pm" and "mm" then > > > > $> ans[[i]][,"pm"], > > $> ans[[i]][,"mm"] > > or ... > > $> ans[[i]][,c(1,2)] > > > > causes R to return the error message. I don't know whether the following > > observation has to do with that or is rather neglectable: > > > > $ # for the probeset with only one probe left I get: > > $ Browse[1]> ans[[i]][1] > > $ pm > > $1051223 > > > > $ # for a probeset with more probes left I get: > > $ Browse[1]> ans[[i]][1] > > $ [1] 1051223 > > > > > > FINALLY: Sounds as if the bells should ring in ecstasy in my > head telling > > me what the problem really is, unfortunately I don't understand it yet. > > Maybe someone has an idea whether this is some kind of expected problem > > which could be corrected by the way "ans" is created. I think I > will have > > to experiment a little bit more... > > > > regards, > > Benjamin > > > > > -----Original Message----- > > > From: bioconductor-bounces at stat.math.ethz.ch > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > > Chernomoretz > > > Sent: 06 December 2005 17:10 > > > To: bioconductor at stat.math.ethz.ch > > > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > > > > > > > > > > > > > Hi Benjamin > > > > > > I am sorry you have problems with the code. > > > > > > Regarding the computation time, yes. It is slow even with > small/mid sized > > > lists, so in your case I would expect a loooot of processing time > > > (more RAM > > > would help also ;-) ). I guess that to improve this, the > > > function should be > > > completely rewritten using perhaps a different approach. > > > > > > However, once you succed in generating a modified cdf > > > environment, and if you > > > are planning to compute expressions many times having the same probes > > > removed, you should keep (save) the modified cdf environment as > > > an R object. > > > I wrote a function (RemoveProbes2, included at the end) that takes the > > > clean.cdf.environment as an input argument. So everything should > > > be faster > > > for subsequent calculations. > > > > > > > > > Unfortunately I am not a lot at the office these days. My postdoc > > > is about to > > > end here, and I am moving in a couple of weeks, so things are a > > > little bit > > > caotic this days! However I will check your code asap. > > > > > > quick observations/questions: > > > 1) are you removing entire probesets specifiyng the whole set of > > > probes via > > > the rmd$Probes list AND the rmd$Set? > > > 2) if you plan to use presence calls, be cautious. After the > > > probe removal > > > step you end up with probeset with different number of probes so > > > be carefull > > > with the significance levels of the Wilcoxon'rank test > > > > > > > > > > > > Regards, > > > Ariel./ > > > > > > > > > > > > > > > ################################################ > > > # clean.cdf: a modified cdf environment. Should be generated > > > # using RemoveProbes and saving the modified > > > # environment afterwards > > > # cdfpackagename: (e.g. "hgu95av2cdf") > > > # probepackagename: (e.g. "hgu95av2probe") > > > # destructive: unimplemented option, see NOTE > > > RemoveProbes2<-function(clean.cdfenv =NULL, > > > > > > cdfpackagename,probepackagename,destructive=TRUE){ > > > > > > #default probe dataset values > > > probe.env.orig <- get(probepackagename) > > > > > > > > > > > > if(is.null(clean.cdfenv)){ > > > cat("A cdfenv should be provided\n")\ > > > return(); > > > }else{ #clean.cdf provided. much, much faster! > > > ipos<-grep(cdfpackagename,search()) > > > rm(list=c(cdfpackagename),pos=ipos) > > > assign(cdfpackagename,clean.cdfenv,pos=ipos) > > > } > > > > > > > > > # setting the PROBE env accordingly (idea from gcrma > > > compute.affinities.R) > > > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > > > newAB <- new("AffyBatch",cdfName=cleancdf) > > > pmIndex <- unlist(indexProbes(newAB,"pm")) > > > subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) > > > rm(newAB) > > > iNA <- whichis.na(subIndex)) > > > > > > if(length(iNA)>0){ > > > ipos<-grep(probepackagename,search()) > > > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > > > } > > > } > > > > > > ############################################### > > > > > > On December 6, 2005 04:07 am, Benjamin Otto wrote: > > > > Hi folks, > > > > > > > > sorry for taking your time again. I let the computer finish the > > > > "removeProbes"-calculation over night, so I don't know how long he > > > > took...to finally display the following error message: > > > > > > > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions" > > > > > > > > You can imagine how desperately I was looking for a shotgun > > > > > > that moment. To > > > > > > > provide maybe more information I forgot to submit in my last thread > > > > here come the commands and extracts of my lists / structures I used: > > > > > > > > $# libraries used > > > > $> library(simpleaffy) > > > > $> library(affydata) > > > > $> require(gcrma) > > > > $ > > > > $# preliminary commands > > > > $> setwd ("....") > > > > $> x <- read.affy() > > > > $> x at cdfName <- "HGU133Plus2" > > > > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE) > > > > $> cdfpackagename <- paste(cleancdf,"cdf",sep="") > > > > $> probepackagename <- paste(cleancdf,"probe",sep="") > > > > $> ResetEnvir(cdfpackagename,probepackagename) > > > > $ > > > > $# my function for indentification of unwanted oligos > > > > $# I will append the function code below > > > > $> rmd <- get.foul.oligos(x, ratio=2, diff=100) > > > > $ > > > > $# the initial rma/mas calculation without changes > > > > $> d.rma <- rma(x) > > > > $> d.gcrma <- gcrma(x) > > > > $> d.mas <- mas5(x) > > > > $> d.mas.call <- mas5calls(x) > > > > $ > > > > $# The command for invoking the removeProbes function > > > > $> RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename) > > > > > > > > > > > > The probe list "rmd$Probes" contains 565825 entries, the set > > > > > > list "rmd$Set" > > > > > > > 32791 entries. A little structure check: > > > > > > > > $> rmd$Set[1:5] > > > > $[1] "1007_s_at" "1053_at" "1255_g_at" "1405_i_at" "1438_at" > > > > $> rmd$Probes[1:5] > > > > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" > "1007_s_at5" > > > > > > > > Looks fine to me, but is it? What could cause the error > message here ? > > > > > > > > Here comes the code of my function: > > > > > > > > > > > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) { > > > > > > > > remove <- list() > > > > remove$Probes <- list() > > > > remove$Set <- list() > > > > len.plist <- 0 > > > > len.slist <- 0 > > > > > > > > > > > > ids <- geneNames(x) > > > > num.ids <- length(ids) > > > > num.samples <- length(pm(x)[1,]) > > > > msk.table <- matrix(NA,num.ids,2) > > > > rm.string <- NA > > > > > > > > for (i in 1:num.ids) { > > > > id <- ids[i] > > > > oligos.pm <- indexProbes(x, which="pm", > genenames=id)[[id]] > > > > oligos.mm <- indexProbes(x, which="mm", > genenames=id)[[id]] > > > > num.oligos <- lengtholigos.pm) > > > > num.rm <- 0 > > > > for (j in 1:num.oligos) { > > > > cat ("i:",i," j:",j,"\n") > > > > rm.flag <- TRUE > > > > for (k in 1:num.samples) { > > > > val.pm <- > intensity(x)[oligos.pm[j],k] > > > > val.mm <- > intensity(x)[oligos.mm[j],k] > > > > if val.pm/val.mm > ratio & > > > > > > val.pm-val.mm > diff) { > > > > > > > rm.flag <- FALSE > > > > } > > > > } > > > > if (rm.flag == TRUE) { > > > > num.rm <- num.rm + 1 > > > > oligo.name <- paste(id,j,sep="") > > > > if (len.plist == 1) { > > > > remove$Probes <- > > > > > > c(remove$Probes,oligo.name) > > > > > > > } > > > > else { remove$Probes <- > coligo.name) } > > > > len.plist <- 1 > > > > > > > > if is.na(rm.string)) { > rm.string <- j } > > > > else { rm.string <- > > > > > > paste(rm.string,j,sep=",") } > > > > > > > } > > > > } > > > > if (num.rm == num.oligos) { > > > > if (len.slist == 1) { > > > > remove$Set <- c(remove$Set,id) > > > > } > > > > else { remove$Set <- c(id) } > > > > len.slist <- 1 > > > > > > > > rm.string <- "all" > > > > } > > > > } > > > > return(remove) > > > > } > > > > > > > > > -----Original Message----- > > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf > Of Benjamin > > > > > Otto > > > > > Sent: 05 December 2005 17:14 > > > > > To: bioconductor at stat.math.ethz.ch > > > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > > > > > > > Hi Ariel and Jenny, > > > > > > > > > > thanks very much for the quick reply!!! I have been experimenting > > > > > since with > > > > > the functions and it really seems to be what I'm looking > for. A small > > > > > test with your test_script Ariel returned what I hoped > for. However > > > > > there is one > > > > > aspect confusing me. After writing a little function myself, > > > > > which computes > > > > > the probe and probeset list which should be excluded in my > > > > > > case I applied > > > > > > > > the removeProbes function on these lists. While I'm writing this > > > > > message my > > > > > computer is still calculating and he has started four > hours ago! Is > > > > > it normal for the function to be so time consuming or is > there some > > > > > chance that > > > > > I just have done something wrong? I suppose the problem is on my > > > > > side but I > > > > > just can't imagine what it could be because I followed the > > > > > > "testscript" > > > > > > > > steps. But as the reading of the CEL files usually needs just a > > > > > few minutes > > > > > (maybe five if time consuming) I wouldn't have thought > the removing > > > > > of probesets would take as long as it currently does. I > must confess > > > > > my lists are really long. Still the problem is of big importance > > > > > > for me and > > > > > > > > so it would be great to get an idea of your experience with > > > > > > the function > > > > > > > > perfomance on your systems. Here is a little summary of the > > > > > conditions in my > > > > > case: > > > > > > > > > > Chiptype: HG_U133_Plus2 > > > > > num. samples: 2 > > > > > num. Probes in listOutProbes: ~ 500.000 > > > > > num. Sets in listOutProbeSets: ~ 30.000 > > > > > System: P4 - 3 GHZ, 500 MB RAM, > > > > > R-version: RGUI 2.2.0 under WindowsXP. > > > > > > > > > > Mainly I have been wondering that the removing of probes from the > > > > > structure > > > > > takes so much longer than reading in the CEL files and > building the > > > > > structure. Has anyone of you maybe started a test before > > > > > > trying to remove > > > > > > > > all probesets from an environment? Or can you at least judge > > > > > > wether the > > > > > > > > current running time in my case is ok or looks suspicious? > > > > > > > > > > In any case just to make it clear again, thanks very much in any > > > > > case. The functions really seem to be of very, very much > help!!! :-) > > > > > > > > > > Sincerly, > > > > > Benjamin > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > > > > Chernomoretz > > > > > Sent: 01 December 2005 17:32 > > > > > To: bioconductor at stat.math.ethz.ch > > > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > > > > > > > Hi Benjamin, > > > > > > > > > > Some time ago I wrote a couple of functions that modified > > > > > the cdf environments in order to remove bad probes from the > > > > > > affy analysis > > > > > > > > You could check: > > > > > > > > > > http://files.protsuggest.org/biocond/html/7350.html > > > > > http://files.protsuggest.org/biocond/html/7366.html > > > > > http://files.protsuggest.org/biocond/html/7367.html > > > > > > > > > > > > > > > hope that helps. > > > > > > > > > > ariel./ > > > > > > > > > > On December 1, 2005 11:06 am, Benjamin Otto wrote: > > > > > > Hi, > > > > > > > > > > > > I have been searching for nearly two days for a solution to the > > > > > > > > > > following > > > > > > > > > > > problem without finding satisfactory answers: > > > > > > > > > > > > I am working on the analysis of a HG-U133_Plus2 Chip. Can I mask > > > > > > for certain probesets single Oligos such that the > expression, p and > > > > > > fold > > > > > > > > > > change > > > > > > > > > > > values are calculated based on the remaining oligos? > > > > > > > > > > > > A better description of my problem and the background. We > > > > > > are handling > > > > > > > > > a cross-species experiment having hybradized rna from > tupaia on the > > > > > > human chip. This resulted in fairly low expression signals. > > > > > > If we just > > > > > > > > > forget about all the other putative problems in analysing > > > > > > the result I > > > > > > > > > think it seems reasonable to say, that in many cases only > > > > > > some of the > > > > > > > > > probeset oligos will have hybridized satisfyingly. So > the idea is > > > > > > masking some of the oligos by some criteria and calculate > > > > > > the results > > > > > > > > > only based upon subsets of the probesets. The problem is: > > > > > > If I set even > > > > > > > > > only one single oligo to NA, the values calculated for the > > > > > > corresponding > > > > > > > > > > probeset won't be > > > > > > > > > > > calculated but set to NA. Most of the threads I found concerning > > > > > > the masking problem handle the question of an autpmated or > > > > > > corrected form > > > > > > > > > of masking. But there seems to be no available information about > > > > > > our case. > > > > > > > > > > Has > > > > > > > > > > > anyone done something like that before? I'm sure there will > > > > > > have to be > > > > > > > > some > > > > > > > > > > > manual programing. But the major question is: Does anybody see a > > > > > > possibility to mask the single oligos on a top level like fixing > > > > > > the affybatch structure? Or do I have to change every single > > > > > > > > > > function to treat > > > > > > > > > > > NA values in the correct form? > > > > > > > > > > > > thanks for your help, > > > > > > Benjamin > > > > > > > > > > > > _______________________________________________ > > > > > > Bioconductor mailing list > > > > > > Bioconductor at stat.math.ethz.ch > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > -- > > > > > Ariel Chernomoretz, Ph.D. > > > > > Centre de recherche du CHUL > > > > > 2705 Blv Laurier, bloc T-367 > > > > > Sainte-Foy, Qc > > > > > G1V 4G2 > > > > > (418)-525-4444 ext 46339 > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at stat.math.ethz.ch > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at stat.math.ethz.ch > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > -- > > > Ariel Chernomoretz, Ph.D. > > > Centre de recherche du CHUL > > > 2705 Blv Laurier, bloc T-367 > > > Sainte-Foy, Qc > > > G1V 4G2 > > > (418)-525-4444 ext 46339 > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- > Ariel Chernomoretz, Ph.D. > Centre de recherche du CHUL > 2705 Blv Laurier, bloc T-367 > Sainte-Foy, Qc > G1V 4G2 > (418)-525-4444 ext 46339 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.0 years ago Benjamin Otto ▴ 830

0

Entering edit mode

Hi Ariel, hi folks, I finally managed the problems discussed in the thread here so I would like to add some final notes. In the case of saving the environment I forgot to put the new cdf environment name in quotas: wrong: ------ $> newcdf <- wrapCdfEnvAffy(hgu133plus2cdf,1164,1164,"hgu133plus2cdf") $> envnewcdf <- as(newcdf,"environment") $> # here comes the error $> x at cdfName <- envnewcdf correct: -------- $> newcdf <- wrapCdfEnvAffy(hgu133plus2cdf,1164,1164,"hgu133plus2cdf") $> envnewcdf <- as(newcdf,"environment") $> x at cdfName <- "envnewcdf" The second problem was the painful removing of certain probes and probesets. The main trigger for a lot of problems in my case turned out to be the grep command in the RemoveProbes() function (in line 24, where line 1 is the function header): $ ii <-grep(paste("^",pset[i],sep=""),probes[,1]) Thats because I'm using the "hgu133plus2" cdf package which contains a huge amount of geneNames AND especially very similar names like: $[1] "121_at" "1563121_at" "217121_at" "218121_at" "227121_at" "233121_at" In this case here the command above returns: $[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 5894 5895 5896 5897 5898 5899 5900 5901 5902 35353 35354 $[26] 35355 35356 35357 35358 35359 35360 35361 37097 37098 37099 37100 37101 37102 37103 37104 37105 49109 49110 49111 49112 49113 49114 49115 53540 53541 $[51] 53542 53543 53544 53545 53546 53547 53548 while it should only return: $[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 This had two major effects in my case: 1. A lot more probes are deleted than desired. 2. The function needed nearly 6-7 hours to finish calculation in stead of a few minutes. Summary: -------- A) I would propose substituting the grep command in RemoveProbes() with the more sensitive version: $ ii <-grep(paste("^",pset[i],sep=""),probes[,1]) which now works fine for me. B) The command sequence for storing the environment afterwards seems to follow the vignette of Laurent Gautier and would be: $> newcdf <- wrapCdfEnvAffy(hgu133plus2cdf,1164,1164,"hgu133plus2cdf") $> envnewcdf <- as(newcdf,"environment") $> x at cdfName <- "envnewcdf" $> save(x,newcdf,envnewcdf,file="/home/otto/R/createdEnvTest.env") Finally, thanky very much Ariel for your help AND the definitely very helpful functions!!! Now that every thing works fine in my case too I am really enthusiastic over the speed the new environments get calculated with. :-) Regards, Benjamin > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin > Otto > Sent: 15 December 2005 13:32 > To: Ariel Chernomoretz; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > Hi Ariel, > > I have been experimenting with the new function "RemoveProbes2" > unfortunately without success. However that might be due to the fact that > I'm not sure how to save the environment. For the saving steps I followed > the vignette of Laurent Gautier under > "www.bioconductor.org/repository/devel/vignette/ngenomeschips.pdf". > To be precise I used the following commands > > > cdfpackagename > [1] "hgu133plus2cdf" > > probepackagename > [1] "hgu133plus2probe" > > plcdf <- wrapCdfEnvAffy(hgu133plus2cdf,1164,1164,"hgu133plus2cdf") > > envplcdf <- as(plcdf,"environment") > > x at cdfName <- envplcdf > > save(x,plcdf,envplcdf,file="HGU133P2x200.rda") > > > > which seemed to work fine, but I can't tell for sure. Now when I > called the > new function it pretends the file doesn't contain any cdf information !? > > > cdfpackagename > [1] "hgu133plus2cdf" > > probepackagename > [1] "hgu133plus2probe" > > > RestoreCdfEnv(clean.cdf="HGU133P2x200.rda",cdfpackagename,probepac > kagename) > Warning: package 'hgu133plus2cdf' is in use and will not be installed > Error in getCdfInfo(object) : Could not obtain CDF environment, problems > encountered: > Specified environment does not contain hgu133plus2 > HGU133P2x200.rda > Data for package affy did not contain hgu133plus2cdf > HGU133P2x200.rda > > I'm aware that most probably the environment doesn't get stored > correctly in > the file at the first place. Yet, I didn't find any convinient alternative > for doing so. > > Regards, > Benjamin > > P.S.: And a merry 1st, 2nd and 3rd Advent > > > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > Chernomoretz > > Sent: 06 December 2005 19:55 > > To: bioconductor at stat.math.ethz.ch > > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > > > > Hi Benjamin > > > > mmm...try replacing > > the line : > > > > $ tmp <- as.vector(ans[[i]][, i.probes]) > > > > with > > > > $ tmp <- as.vector(as.matrix(ans[[i]])[, i.probes]) > > > > > > > > > > here is why i think that may work... > > > > > l<-list(1,as.matrix(1)) > > > l[[1]][,1] > > Error in l[[1]][, 1] : incorrect number of dimensions > > > l[[2]][,1] > > [1] 1 > > > as.matrix(l[[1]])[,1] > > [1] 1 > > > > > > regards > > ariel./ > > > > > > > > On December 6, 2005 12:54 pm, Benjamin Otto wrote: > > > Hi Ariel, > > > > > > well I wish you a loooot of luck then for the pre-postdoc time. > > I hope you > > > like your new job. :-) > > > > > > I think I have localized the problem for the function crash although I > > > don't quite understand it yet. I was just writing this text when your > > > answer reached me. Thanks! > > > Back to the crash: I will just give two examples for better > explanation. > > > > > > #Example 1: Include all probes of the probeset 117_at > > > # with exception of the first probe > > > # and don't delete the set itself > > > $> listOutProbes <- c("117_at2","117_at3","117_at4", > > > $+ "117_at5","117_at6","117_at7","117_at8","117_at9", > > > $+ "117_at10","117_at11","117_at12","117_at13", > > > $+ "117_at14","117_at15","117_at16") > > > > > > $> sets <- NULL > > > $> ResetEnvir(cdfpackagename,probepackagename) > > > $> RemoveProbes(probes,sets,cdfpackagename,probepackagename) > > > > > > > > > #Example 2: Exapmle 1 again but ... > > > # ... delete the set from the list > > > $> listOutProbes <- c("117_at2","117_at3","117_at4", > > > $+ "117_at5","117_at6","117_at7","117_at8","117_at9", > > > $+ "117_at10","117_at11","117_at12","117_at13", > > > $+ "117_at14","117_at15","117_at16") > > > > > > $> sets <- ("117_at") > > > $> .. > > > $> .. > > > > > > > > > ONLY the construction in example number 1!!! returns the > error message: > > > > > > $ Error in ans[[i]][, i.probes] : incorrect number of dimensions > > > > > > and that when it reaches the command: > > > > > > $ pmIndex <- unlist(indexProbes(newAB,"pm")) > > > > > > in function "RemoveProbes". Note that the command > > "indexProbes(newAB,"pm")" > > > is causing the problem. That's why I snooped into the code of > the method > > > "indexProbes" and voila ... the problem source is finally the command: > > > > > > $ tmp <- as.vector(ans[[i]][, i.probes]) > > > > > > where "ans" was created by > > > > > > $> envir <- getCdfInfo(newAB) > > > $> genenames <- ls(envir) > > > $> ans <- mget(genenames, envir, ifnotfound = NA) > > > > > > > > > NOW: If ans[[i]] contains only ONE value for each of "pm" and > "mm" then > > > > > > $> ans[[i]][,"pm"], > > > $> ans[[i]][,"mm"] > > > or ... > > > $> ans[[i]][,c(1,2)] > > > > > > causes R to return the error message. I don't know whether > the following > > > observation has to do with that or is rather neglectable: > > > > > > $ # for the probeset with only one probe left I get: > > > $ Browse[1]> ans[[i]][1] > > > $ pm > > > $1051223 > > > > > > $ # for a probeset with more probes left I get: > > > $ Browse[1]> ans[[i]][1] > > > $ [1] 1051223 > > > > > > > > > FINALLY: Sounds as if the bells should ring in ecstasy in my > > head telling > > > me what the problem really is, unfortunately I don't > understand it yet. > > > Maybe someone has an idea whether this is some kind of > expected problem > > > which could be corrected by the way "ans" is created. I think I > > will have > > > to experiment a little bit more... > > > > > > regards, > > > Benjamin > > > > > > > -----Original Message----- > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Ariel > > > > Chernomoretz > > > > Sent: 06 December 2005 17:10 > > > > To: bioconductor at stat.math.ethz.ch > > > > Subject: Re: [BioC] [SPAM???] Re: Masking single Oligos in mas5 > > > > > > > > > > > > > > > > > > > > Hi Benjamin > > > > > > > > I am sorry you have problems with the code. > > > > > > > > Regarding the computation time, yes. It is slow even with > > small/mid sized > > > > lists, so in your case I would expect a loooot of processing time > > > > (more RAM > > > > would help also ;-) ). I guess that to improve this, the > > > > function should be > > > > completely rewritten using perhaps a different approach. > > > > > > > > However, once you succed in generating a modified cdf > > > > environment, and if you > > > > are planning to compute expressions many times having the > same probes > > > > removed, you should keep (save) the modified cdf environment as > > > > an R object. > > > > I wrote a function (RemoveProbes2, included at the end) > that takes the > > > > clean.cdf.environment as an input argument. So everything should > > > > be faster > > > > for subsequent calculations. > > > > > > > > > > > > Unfortunately I am not a lot at the office these days. My postdoc > > > > is about to > > > > end here, and I am moving in a couple of weeks, so things are a > > > > little bit > > > > caotic this days! However I will check your code asap. > > > > > > > > quick observations/questions: > > > > 1) are you removing entire probesets specifiyng the whole set of > > > > probes via > > > > the rmd$Probes list AND the rmd$Set? > > > > 2) if you plan to use presence calls, be cautious. After the > > > > probe removal > > > > step you end up with probeset with different number of probes so > > > > be carefull > > > > with the significance levels of the Wilcoxon'rank test > > > > > > > > > > > > > > > > Regards, > > > > Ariel./ > > > > > > > > > > > > > > > > > > > > ################################################ > > > > # clean.cdf: a modified cdf environment. Should be generated > > > > # using RemoveProbes and saving the modified > > > > # environment afterwards > > > > # cdfpackagename: (e.g. "hgu95av2cdf") > > > > # probepackagename: (e.g. "hgu95av2probe") > > > > # destructive: unimplemented option, see NOTE > > > > RemoveProbes2<-function(clean.cdfenv =NULL, > > > > > > > > cdfpackagename,probepackagename,destructive=TRUE){ > > > > > > > > #default probe dataset values > > > > probe.env.orig <- get(probepackagename) > > > > > > > > > > > > > > > > if(is.null(clean.cdfenv)){ > > > > cat("A cdfenv should be provided\n")\ > > > > return(); > > > > }else{ #clean.cdf provided. much, much faster! > > > > ipos<-grep(cdfpackagename,search()) > > > > rm(list=c(cdfpackagename),pos=ipos) > > > > assign(cdfpackagename,clean.cdfenv,pos=ipos) > > > > } > > > > > > > > > > > > # setting the PROBE env accordingly (idea from gcrma > > > > compute.affinities.R) > > > > tmp <- get("xy2i",paste("package:",cdfpackagename,sep="")) > > > > newAB <- new("AffyBatch",cdfName=cleancdf) > > > > pmIndex <- unlist(indexProbes(newAB,"pm")) > > > > subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex) > > > > rm(newAB) > > > > iNA <- whichis.na(subIndex)) > > > > > > > > if(length(iNA)>0){ > > > > ipos<-grep(probepackagename,search()) > > > > assign(probepackagename,probe.env.orig[-iNA,],pos=ipos) > > > > } > > > > } > > > > > > > > ############################################### > > > > > > > > On December 6, 2005 04:07 am, Benjamin Otto wrote: > > > > > Hi folks, > > > > > > > > > > sorry for taking your time again. I let the computer finish the > > > > > "removeProbes"-calculation over night, so I don't know how long he > > > > > took...to finally display the following error message: > > > > > > > > > > "Error in ans[[i]][, i.probes] : incorrect number of dimensions" > > > > > > > > > > You can imagine how desperately I was looking for a shotgun > > > > > > > > that moment. To > > > > > > > > > provide maybe more information I forgot to submit in my > last thread > > > > > here come the commands and extracts of my lists / > structures I used: > > > > > > > > > > $# libraries used > > > > > $> library(simpleaffy) > > > > > $> library(affydata) > > > > > $> require(gcrma) > > > > > $ > > > > > $# preliminary commands > > > > > $> setwd ("....") > > > > > $> x <- read.affy() > > > > > $> x at cdfName <- "HGU133Plus2" > > > > > $> cleancdf <- cleancdfname(x at cdfName,addcdf=FALSE) > > > > > $> cdfpackagename <- paste(cleancdf,"cdf",sep="") > > > > > $> probepackagename <- paste(cleancdf,"probe",sep="") > > > > > $> ResetEnvir(cdfpackagename,probepackagename) > > > > > $ > > > > > $# my function for indentification of unwanted oligos > > > > > $# I will append the function code below > > > > > $> rmd <- get.foul.oligos(x, ratio=2, diff=100) > > > > > $ > > > > > $# the initial rma/mas calculation without changes > > > > > $> d.rma <- rma(x) > > > > > $> d.gcrma <- gcrma(x) > > > > > $> d.mas <- mas5(x) > > > > > $> d.mas.call <- mas5calls(x) > > > > > $ > > > > > $# The command for invoking the removeProbes function > > > > > $> > RemoveProbes(rmd$Probes,rmd$Set,cdfpackagename,probepackagename) > > > > > > > > > > > > > > > The probe list "rmd$Probes" contains 565825 entries, the set > > > > > > > > list "rmd$Set" > > > > > > > > > 32791 entries. A little structure check: > > > > > > > > > > $> rmd$Set[1:5] > > > > > $[1] "1007_s_at" "1053_at" "1255_g_at" "1405_i_at" "1438_at" > > > > > $> rmd$Probes[1:5] > > > > > $[1] "1007_s_at1" "1007_s_at2" "1007_s_at3" "1007_s_at4" > > "1007_s_at5" > > > > > > > > > > Looks fine to me, but is it? What could cause the error > > message here ? > > > > > > > > > > Here comes the code of my function: > > > > > > > > > > > > > > > get.foul.oligos <- function(x,ratio=1,diff=150,verbose=FALSE) { > > > > > > > > > > remove <- list() > > > > > remove$Probes <- list() > > > > > remove$Set <- list() > > > > > len.plist <- 0 > > > > > len.slist <- 0 > > > > > > > > > > > > > > > ids <- geneNames(x) > > > > > num.ids <- length(ids) > > > > > num.samples <- length(pm(x)[1,]) > > > > > msk.table <- matrix(NA,num.ids,2) > > > > > rm.string <- NA > > > > > > > > > > for (i in 1:num.ids) { > > > > > id <- ids[i] > > > > > oligos.pm <- indexProbes(x, which="pm", > > genenames=id)[[id]] > > > > > oligos.mm <- indexProbes(x, which="mm", > > genenames=id)[[id]] > > > > > num.oligos <- lengtholigos.pm) > > > > > num.rm <- 0 > > > > > for (j in 1:num.oligos) { > > > > > cat ("i:",i," j:",j,"\n") > > > > > rm.flag <- TRUE > > > > > for (k in 1:num.samples) { > > > > > val.pm <- > > intensity(x)[oligos.pm[j],k] > > > > > val.mm <- > > intensity(x)[oligos.mm[j],k] > > > > > if val.pm/val.mm > ratio & > > > > > > > > val.pm-val.mm > diff) { > > > > > > > > > rm.flag <- FALSE > > > > > } > > > > > } > > > > > if (rm.flag == TRUE) { > > > > > num.rm <- num.rm + 1 > > > > > oligo.name <- paste(id,j,sep="") > > > > > if (len.plist == 1) { > > > > > remove$Probes <- > > > > > > > > c(remove$Probes,oligo.name) > > > > > > > > > } > > > > > else { remove$Probes <- > > coligo.name) } > > > > > len.plist <- 1 > > > > > > > > > > if is.na(rm.string)) { > > rm.string <- j } > > > > > else { rm.string <- > > > > > > > > paste(rm.string,j,sep=",") } > > > > > > > > > } > > > > > } > > > > > if (num.rm == num.oligos) { > > > > > if (len.slist == 1) { > > > > > remove$Set <- c(remove$Set,id) > > > > > } > > > > > else { remove$Set <- c(id) } > > > > > len.slist <- 1 > > > > > > > > > > rm.string <- "all" > > > > > } > > > > > } > > > > > return(remove) > > > > > } > > > > > > > > > > > -----Original Message----- > > > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf > > Of Benjamin > > > > > > Otto > > > > > > Sent: 05 December 2005 17:14 > > > > > > To: bioconductor at stat.math.ethz.ch > > > > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > > > > > > > > > > Hi Ariel and Jenny, > > > > > > > > > > > > thanks very much for the quick reply!!! I have been > experimenting > > > > > > since with > > > > > > the functions and it really seems to be what I'm looking > > for. A small > > > > > > test with your test_script Ariel returned what I hoped > > for. However > > > > > > there is one > > > > > > aspect confusing me. After writing a little function myself, > > > > > > which computes > > > > > > the probe and probeset list which should be excluded in my > > > > > > > > case I applied > > > > > > > > > > the removeProbes function on these lists. While I'm writing this > > > > > > message my > > > > > > computer is still calculating and he has started four > > hours ago! Is > > > > > > it normal for the function to be so time consuming or is > > there some > > > > > > chance that > > > > > > I just have done something wrong? I suppose the problem is on my > > > > > > side but I > > > > > > just can't imagine what it could be because I followed the > > > > > > > > "testscript" > > > > > > > > > > steps. But as the reading of the CEL files usually needs just a > > > > > > few minutes > > > > > > (maybe five if time consuming) I wouldn't have thought > > the removing > > > > > > of probesets would take as long as it currently does. I > > must confess > > > > > > my lists are really long. Still the problem is of big importance > > > > > > > > for me and > > > > > > > > > > so it would be great to get an idea of your experience with > > > > > > > > the function > > > > > > > > > > perfomance on your systems. Here is a little summary of the > > > > > > conditions in my > > > > > > case: > > > > > > > > > > > > Chiptype: HG_U133_Plus2 > > > > > > num. samples: 2 > > > > > > num. Probes in listOutProbes: ~ 500.000 > > > > > > num. Sets in listOutProbeSets: ~ 30.000 > > > > > > System: P4 - 3 GHZ, 500 MB RAM, > > > > > > R-version: RGUI 2.2.0 under WindowsXP. > > > > > > > > > > > > Mainly I have been wondering that the removing of > probes from the > > > > > > structure > > > > > > takes so much longer than reading in the CEL files and > > building the > > > > > > structure. Has anyone of you maybe started a test before > > > > > > > > trying to remove > > > > > > > > > > all probesets from an environment? Or can you at least judge > > > > > > > > wether the > > > > > > > > > > current running time in my case is ok or looks suspicious? > > > > > > > > > > > > In any case just to make it clear again, thanks very much in any > > > > > > case. The functions really seem to be of very, very much > > help!!! :-) > > > > > > > > > > > > Sincerly, > > > > > > Benjamin > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: bioconductor-bounces at stat.math.ethz.ch > > > > > > [mailto:bioconductor-bounces at stat.math.ethz.ch]On > Behalf Of Ariel > > > > > > Chernomoretz > > > > > > Sent: 01 December 2005 17:32 > > > > > > To: bioconductor at stat.math.ethz.ch > > > > > > Subject: Re: [BioC] Masking single Oligos in mas5 > > > > > > > > > > > > > > > > > > Hi Benjamin, > > > > > > > > > > > > Some time ago I wrote a couple of functions that modified > > > > > > the cdf environments in order to remove bad probes from the > > > > > > > > affy analysis > > > > > > > > > > You could check: > > > > > > > > > > > > http://files.protsuggest.org/biocond/html/7350.html > > > > > > http://files.protsuggest.org/biocond/html/7366.html > > > > > > http://files.protsuggest.org/biocond/html/7367.html > > > > > > > > > > > > > > > > > > hope that helps. > > > > > > > > > > > > ariel./ > > > > > > > > > > > > On December 1, 2005 11:06 am, Benjamin Otto wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I have been searching for nearly two days for a > solution to the > > > > > > > > > > > > following > > > > > > > > > > > > > problem without finding satisfactory answers: > > > > > > > > > > > > > > I am working on the analysis of a HG-U133_Plus2 Chip. > Can I mask > > > > > > > for certain probesets single Oligos such that the > > expression, p and > > > > > > > fold > > > > > > > > > > > > change > > > > > > > > > > > > > values are calculated based on the remaining oligos? > > > > > > > > > > > > > > A better description of my problem and the background. We > > > > > > > > are handling > > > > > > > > > > > a cross-species experiment having hybradized rna from > > tupaia on the > > > > > > > human chip. This resulted in fairly low expression signals. > > > > > > > > If we just > > > > > > > > > > > forget about all the other putative problems in analysing > > > > > > > > the result I > > > > > > > > > > > think it seems reasonable to say, that in many cases only > > > > > > > > some of the > > > > > > > > > > > probeset oligos will have hybridized satisfyingly. So > > the idea is > > > > > > > masking some of the oligos by some criteria and calculate > > > > > > > > the results > > > > > > > > > > > only based upon subsets of the probesets. The problem is: > > > > > > > > If I set even > > > > > > > > > > > only one single oligo to NA, the values calculated for the > > > > > > > corresponding > > > > > > > > > > > > probeset won't be > > > > > > > > > > > > > calculated but set to NA. Most of the threads I found > concerning > > > > > > > the masking problem handle the question of an autpmated or > > > > > > > > corrected form > > > > > > > > > > > of masking. But there seems to be no available > information about > > > > > > > our case. > > > > > > > > > > > > Has > > > > > > > > > > > > > anyone done something like that before? I'm sure there will > > > > > > > > have to be > > > > > > > > > > some > > > > > > > > > > > > > manual programing. But the major question is: Does > anybody see a > > > > > > > possibility to mask the single oligos on a top level > like fixing > > > > > > > the affybatch structure? Or do I have to change every single > > > > > > > > > > > > function to treat > > > > > > > > > > > > > NA values in the correct form? > > > > > > > > > > > > > > thanks for your help, > > > > > > > Benjamin > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioconductor mailing list > > > > > > > Bioconductor at stat.math.ethz.ch > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > -- > > > > > > Ariel Chernomoretz, Ph.D. > > > > > > Centre de recherche du CHUL > > > > > > 2705 Blv Laurier, bloc T-367 > > > > > > Sainte-Foy, Qc > > > > > > G1V 4G2 > > > > > > (418)-525-4444 ext 46339 > > > > > > > > > > > > _______________________________________________ > > > > > > Bioconductor mailing list > > > > > > Bioconductor at stat.math.ethz.ch > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > _______________________________________________ > > > > > > Bioconductor mailing list > > > > > > Bioconductor at stat.math.ethz.ch > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at stat.math.ethz.ch > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > -- > > > > Ariel Chernomoretz, Ph.D. > > > > Centre de recherche du CHUL > > > > 2705 Blv Laurier, bloc T-367 > > > > Sainte-Foy, Qc > > > > G1V 4G2 > > > > (418)-525-4444 ext 46339 > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > -- > > Ariel Chernomoretz, Ph.D. > > Centre de recherche du CHUL > > 2705 Blv Laurier, bloc T-367 > > Sainte-Foy, Qc > > G1V 4G2 > > (418)-525-4444 ext 46339 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 18.9 years ago Benjamin Otto ▴ 830

Login before adding your answer.