Subsetting in xps

0

Entering edit mode

Michael Walter ▴ 160

@michael-walter-3141

Last seen 10.5 years ago

Dear all, I'm using xps to read Affy gene array cel files. If the number of arrays is exceeding 15 arrays I can no longer look at the feature data due to memory limitations. So I'd like to generate a root tree for the entire study and then look at the values for a subset for QC. I can generate a data tree set with the intensities of only a fraction by specifing the sample names (please see code and sessionInfo below). However, when I subsequently run RMA on the newly generated tree the resulting data frame contains all samples from the initial root tree. The code here is a small example with 4 arrays for demo purposes. If anyone has a suggestion, how to obtain the normalized signales from my subset, I would be very happy. Kind regards, Michael Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. > library(RColorBrewer) > library(xps) Welcome to xps version 1.4.6 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2009 by Christian Stratowa Attache Paket: 'xps' The following object(s) are masked from package:Biobase : exprs, exprs<-, se.exprs > > project = "M9R_001" > celfile = getwd() > filenames = list.files(path=celfile) > filenames = filenames[grep(".CEL", filenames)] > filenames [1] "M9R_001c01_1_(HuGene-1_0-st-v1).CEL" "M9R_001c02_1_(HuGene- 1_0-st-v1).CEL" [3] "M9R_001c03_1_(HuGene-1_0-st-v1).CEL" "M9R_001c04_1_(HuGene- 1_0-st-v1).CEL" > scheme.HuGene10 <- root.scheme(paste("X:/affy/QC_Scripts/xps/schemes ","Scheme_HuGene10stv1r4_na27_2.root",sep="/")) > data.xps <- root.data(scheme.HuGene10, + paste(getwd(), paste(project, "_cel.root", sep=""), sep="/")) > fname.tree = data.xps at treenames > data <- attachMask(data.xps) > data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3])) > head(data at data) X Y M9R_001c01_1_(HuGene-1_0-st-v1).cel_MEAN 1 0 0 6745 2 1 0 124 3 2 0 6719 4 3 0 90 5 4 0 61 6 5 0 89 M9R_001c02_1_(HuGene-1_0-st-v1).cel_MEAN 1 8246 2 127 3 8231 4 122 5 61 6 127 M9R_001c03_1_(HuGene-1_0-st-v1).cel_MEAN 1 6190 2 112 3 5958 4 72 5 68 6 116 > dim(data at data) [1] 1102500 5 > data.rma <- rma(data, "tmpdt_dataRMA", background="antigenomic", normalize=T, + exonlevel=c(402492,402492,402492), verbose = FALSE) > expr.rma <- validData(data.rma) > dim(expr.rma) [1] 33025 4 > sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETAR Y=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.4.6 RColorBrewer_1.0-2 Biobase_2.4.1 ROC_1.18.0 > -- MFT Services University of Tuebingen Calwerstr. 7 72076 T?bingen/GERMANY Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228 Confidentiality Note:\ This message is intended only for...{{dropped:9}}

affy xps affy xps • 1.1k views

ADD COMMENT • link 15.3 years ago Michael Walter ▴ 160

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.4 years ago

Austria

Dear Michael, In the following I will try to describe some options which may be helpful in cases where you cannot load all feature data due to memory limitations: 1, The option to load only a subset of feature data using: > data <- attachMask(data.xps) > data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3])) allows you to do some QC such as e.g. boxplot(), hist() and pmplot(). However, when running RMA all data will be used since running rma is independent of the imported data. 2, When you have imported all CEL files into a root data file you can create a subset as follows (see ?root.data): # load only a subset from a ROOT data file > rootfile <- paste(getwd(), paste(project, "_cel.root", sep=""), sep="/") > subdata.xps <- root.data(scheme.HuGene10, rootfile=rootfile, celnames=c("Name2.cel","Name7.cel","Name9.cel")) Now you can run RMA using this subset only: > data.rma <- rma(subdata.xps,...) 3, You can also do some QC w/o the need to import the feature data: # density plot: > root.density(data.xps) # you can also save the density plot for each chip using > for (tree in treeNames(data.xps)) { > root.density(data.xps, treename=tree, canvasname=tree, save.as="png") > } # image plot: > root.image(data.xps, treename="MyName.cel") # save image automatically > root.image(data.xps, treename="MyName.cel", logbase="log2", canvasname="Image_MyName_log2", save.as="png") # profile plot (similar to boxplot) > root.profile(data.xps) However, the profile plots may also have some memory limitations. In this case you can create profile plots from subsets only by using parameter "treename". Maybe one more note: I am not sure if you really want to use long filenames such as "M9R_001c01_1_(HuGene-1_0-st-v1).CEL". Function "import.data" has parameter "celnames" which you could use for alternative filenames such as e.g. celnames=c("M9R_001c01_1",...). You can still have access to the original filenames using: > filenames <- rawCELName(data.xps) You can also find many code examples for whole genome and exon arrays in the files "script4xps.R" and "script4exon.R" located in the package directory "xps/examples". Please let me know if this information could answer your questions. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ Michael Walter wrote: > Dear all, > > I'm using xps to read Affy gene array cel files. If the number of arrays is exceeding 15 arrays I can no longer look at the feature data due to memory limitations. So I'd like to generate a root tree for the entire study and then look at the values for a subset for QC. I can generate a data tree set with the intensities of only a fraction by specifing the sample names (please see code and sessionInfo below). However, when I subsequently run RMA on the newly generated tree the resulting data frame contains all samples from the initial root tree. The code here is a small example with 4 arrays for demo purposes. If anyone has a suggestion, how to obtain the normalized signales from my subset, I would be very happy. > > Kind regards, > > Michael > > > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > >> library(RColorBrewer) >> library(xps) >> > > Welcome to xps version 1.4.6 > an R wrapper for XPS - eXpression Profiling System > (c) Copyright 2001-2009 by Christian Stratowa > > > Attache Paket: 'xps' > > > The following object(s) are masked from package:Biobase : > > exprs, > exprs<-, > se.exprs > > >> project = "M9R_001" >> > > >> celfile = getwd() >> > > >> filenames = list.files(path=celfile) >> > > >> filenames = filenames[grep(".CEL", filenames)] >> > > >> filenames >> > > [1] "M9R_001c01_1_(HuGene-1_0-st-v1).CEL" "M9R_001c02_1_(HuGene- 1_0-st-v1).CEL" > > [3] "M9R_001c03_1_(HuGene-1_0-st-v1).CEL" "M9R_001c04_1_(HuGene- 1_0-st-v1).CEL" > > >> scheme.HuGene10 <- root.scheme(paste("X:/affy/QC_Scripts/xps/scheme s","Scheme_HuGene10stv1r4_na27_2.root",sep="/")) >> > > >> data.xps <- root.data(scheme.HuGene10, >> > + paste(getwd(), paste(project, "_cel.root", sep=""), sep="/")) > > >> fname.tree = data.xps at treenames >> > > >> data <- attachMask(data.xps) >> > > >> data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3])) >> > > >> head(data at data) >> > X Y M9R_001c01_1_(HuGene-1_0-st-v1).cel_MEAN > 1 0 0 6745 > 2 1 0 124 > 3 2 0 6719 > 4 3 0 90 > 5 4 0 61 > 6 5 0 89 > > M9R_001c02_1_(HuGene-1_0-st-v1).cel_MEAN > 1 8246 > 2 127 > 3 8231 > 4 122 > 5 61 > 6 127 > > M9R_001c03_1_(HuGene-1_0-st-v1).cel_MEAN > 1 6190 > 2 112 > 3 5958 > 4 72 > 5 68 > 6 116 > > >> dim(data at data) >> > > [1] 1102500 5 > > >> data.rma <- rma(data, "tmpdt_dataRMA", background="antigenomic", normalize=T, >> > + exonlevel=c(402492,402492,402492), verbose = FALSE) > > >> expr.rma <- validData(data.rma) >> > > >> dim(expr.rma) >> > > [1] 33025 4 > > >> sessionInfo() >> > > R version 2.9.0 (2009-04-17) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONET ARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.4.6 RColorBrewer_1.0-2 Biobase_2.4.1 ROC_1.18.0 > > > >

ADD COMMENT • link 15.3 years ago cstrato ★ 3.9k

0

Entering edit mode

Michael Walter ▴ 160

@michael-walter-3141

Last seen 10.5 years ago

Dear Christian, That was a great deal of help. I have not tried all tipps, but I surely will. Thank you very much, Michael > -----Urspr?ngliche Nachricht----- > Von: "cstrato" <cstrato at="" aon.at=""> > Gesendet: 06.11.09 19:42:14 > An: Michael Walter <michael.walter at="" med.uni-tuebingen.de=""> > CC: bioconductor at stat.math.ethz.ch > Betreff: Re: [BioC] Subsetting in xps > Dear Michael, > > In the following I will try to describe some options which may be > helpful in cases where you cannot load all feature data due to memory > limitations: > > 1, The option to load only a subset of feature data using: > > data <- attachMask(data.xps) > > data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3])) > allows you to do some QC such as e.g. boxplot(), hist() and pmplot(). > However, when running RMA all data will be used since running rma is > independent of the imported data. > > 2, When you have imported all CEL files into a root data file you can > create a subset as follows (see ?root.data): > # load only a subset from a ROOT data file > > rootfile <- paste(getwd(), paste(project, "_cel.root", sep=""), sep="/") > > subdata.xps <- root.data(scheme.HuGene10, rootfile=rootfile, > celnames=c("Name2.cel","Name7.cel","Name9.cel")) > Now you can run RMA using this subset only: > > data.rma <- rma(subdata.xps,...) > > 3, You can also do some QC w/o the need to import the feature data: > # density plot: > > root.density(data.xps) > > # you can also save the density plot for each chip using > > for (tree in treeNames(data.xps)) { > > root.density(data.xps, treename=tree, canvasname=tree, save.as="png") > > } > > # image plot: > > root.image(data.xps, treename="MyName.cel") > # save image automatically > > root.image(data.xps, treename="MyName.cel", logbase="log2", > canvasname="Image_MyName_log2", save.as="png") > > # profile plot (similar to boxplot) > > root.profile(data.xps) > However, the profile plots may also have some memory limitations. In > this case you can create profile plots from subsets only by using > parameter "treename". > > Maybe one more note: I am not sure if you really want to use long > filenames such as "M9R_001c01_1_(HuGene-1_0-st-v1).CEL". Function > "import.data" has parameter "celnames" which you could use for > alternative filenames such as e.g. celnames=c("M9R_001c01_1",...). You > can still have access to the original filenames using: > > filenames <- rawCELName(data.xps) > > You can also find many code examples for whole genome and exon arrays in > the files "script4xps.R" and "script4exon.R" located in the package > directory "xps/examples". > > Please let me know if this information could answer your questions. > > Best regards > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > Michael Walter wrote: > > Dear all, > > > > I'm using xps to read Affy gene array cel files. If the number of arrays is exceeding 15 arrays I can no longer look at the feature data due to memory limitations. So I'd like to generate a root tree for the entire study and then look at the values for a subset for QC. I can generate a data tree set with the intensities of only a fraction by specifing the sample names (please see code and sessionInfo below). However, when I subsequently run RMA on the newly generated tree the resulting data frame contains all samples from the initial root tree. The code here is a small example with 4 arrays for demo purposes. If anyone has a suggestion, how to obtain the normalized signales from my subset, I would be very happy. > > > > Kind regards, > > > > Michael > > > > > > > > Welcome to Bioconductor > > > > Vignettes contain introductory material. To view, type > > 'openVignette()'. To cite Bioconductor, see > > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > > > >> library(RColorBrewer) > >> library(xps) > >> > > > > Welcome to xps version 1.4.6 > > an R wrapper for XPS - eXpression Profiling System > > (c) Copyright 2001-2009 by Christian Stratowa > > > > > > Attache Paket: 'xps' > > > > > > The following object(s) are masked from package:Biobase : > > > > exprs, > > exprs<-, > > se.exprs > > > > > >> project = "M9R_001" > >> > > > > > >> celfile = getwd() > >> > > > > > >> filenames = list.files(path=celfile) > >> > > > > > >> filenames = filenames[grep(".CEL", filenames)] > >> > > > > > >> filenames > >> > > > > [1] "M9R_001c01_1_(HuGene-1_0-st-v1).CEL" "M9R_001c02_1_(HuGene- 1_0-st-v1).CEL" > > > > [3] "M9R_001c03_1_(HuGene-1_0-st-v1).CEL" "M9R_001c04_1_(HuGene- 1_0-st-v1).CEL" > > > > > >> scheme.HuGene10 <- root.scheme(paste("X:/affy/QC_Scripts/xps/sche mes","Scheme_HuGene10stv1r4_na27_2.root",sep="/")) > >> > > > > > >> data.xps <- root.data(scheme.HuGene10, > >> > > + paste(getwd(), paste(project, "_cel.root", sep=""), sep="/")) > > > > > >> fname.tree = data.xps at treenames > >> > > > > > >> data <- attachMask(data.xps) > >> > > > > > >> data <- attachInten(data.xps, treenames=unlist(fname.tree[1:3])) > >> > > > > > >> head(data at data) > >> > > X Y M9R_001c01_1_(HuGene-1_0-st-v1).cel_MEAN > > 1 0 0 6745 > > 2 1 0 124 > > 3 2 0 6719 > > 4 3 0 90 > > 5 4 0 61 > > 6 5 0 89 > > > > M9R_001c02_1_(HuGene-1_0-st-v1).cel_MEAN > > 1 8246 > > 2 127 > > 3 8231 > > 4 122 > > 5 61 > > 6 127 > > > > M9R_001c03_1_(HuGene-1_0-st-v1).cel_MEAN > > 1 6190 > > 2 112 > > 3 5958 > > 4 72 > > 5 68 > > 6 116 > > > > > >> dim(data at data) > >> > > > > [1] 1102500 5 > > > > > >> data.rma <- rma(data, "tmpdt_dataRMA", background="antigenomic", normalize=T, > >> > > + exonlevel=c(402492,402492,402492), verbose = FALSE) > > > > > >> expr.rma <- validData(data.rma) > >> > > > > > >> dim(expr.rma) > >> > > > > [1] 33025 4 > > > > > >> sessionInfo() > >> > > > > R version 2.9.0 (2009-04-17) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MON ETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] xps_1.4.6 RColorBrewer_1.0-2 Biobase_2.4.1 ROC_1.18.0 > > > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- MFT Services University of Tuebingen Calwerstr. 7 72076 T?bingen/GERMANY Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228 Confidentiality Note:\ This message is intended only for...{{dropped:9}}

ADD COMMENT • link 15.3 years ago Michael Walter ▴ 160

Login before adding your answer.