rma for tiling arrays (oligo package)
1
0
Entering edit mode
Ann Hess ▴ 340
@ann-hess-251
Last seen 10.3 years ago
After creating an appropriate library using the makePDpackage, I am using the oligo package to open and work with Affymetrix Arabidopsis Tiling 1.0R Arrays. I am interested in using the rma function to background correct and normalize the data, but I am not sure how to map the processed data back to probes or directly to chromosome and position. What do the rownames of the expression matrix created by rma correspond to? My best guess is that they correspond to chromosome position (which can be found using pmChr, but not for an ExpressionSet object). However, these positions are relative to a particular chromosome and therefore not unique. For example, there are probes corresponding to position 417 on both Chromosome 3 and Chromosome 5, but only a single row in the ExpressionSet object corresponding to 417. Is there a way to background correct and normalize the data without the rma function? Perhaps this would allow for easier mapping to probes. Any suggestions would be appreciated. Ann Code and session info is here: > library(oligo) > library(pd.at35b.mr.v04.2.tigrv5) > AllArrays<-read.celfiles(list.celfiles(),pk="pd.at35b.mr.v04.2.tigrv 5") > dim(pm(AllArrays)) [1] 3092374 12 > dim(mm(AllArrays)) [1] 3092338 12 > Pos<-pmPosition(AllArrays) > length(Pos) [1] 3092374 > length(unique(Pos)) [1] 2921991 > RMAout<-rma(AllArrays) > dim(exprs(RMAout)) [1] 2921991 12 > exprs(RMAout)[1:10,1:2] Comp5-1_1006.CEL Comp5-2_1006.CEL 0 3.344400 3.295634 1 1.988137 1.708682 1000 6.315857 7.297425 10000009 9.053133 8.754469 10000014 2.106050 2.137780 10000024 10.392988 9.385502 10000026 2.242264 5.487639 10000034 1.830658 5.239400 1000004 3.097441 5.825040 10000046 6.839724 7.221181 > sessionInfo() R version 2.6.0 (2007-10-03) x86_64-redhat-linux-gnu attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] pd.at35b.mr.v04.2.tigrv5_1.2.0 oligo_1.2.2 [3] oligoClasses_1.0.3 affxparser_1.10.2 [5] AnnotationDbi_1.0.6 preprocessCore_1.0.0 [7] RSQLite_0.6-9 DBI_0.2-4 [9] Biobase_1.16.3 loaded via a namespace (and not attached): [1] rcompgen_0.1-17
oligo oligo • 1.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Ann, I don't think you want to use rma() directly, as it is going to try to do a medianpolish on probesets but such a thing doesn't exist for the tiling arrays. If you want to use the background correction and normalization that are used by rma() then I think it will take some work on your part. The functions you will want to use are part of the affy package, but you don't really want to load affy and oligo at the same time because there are so many identically named functions (they both have namespaces, so this isn't the end of the world, but it is easier if you don't have to deal with name collisions). I would personally just copy the functions normalize.quantiles() and rma.background.correct() from affy into a file (say, affysources.R) and then source that into R. Both of these functions want you to pass a matrix, so you would want to extract the pm data from your AllArrays object, run rma.background.correct() and then normalize.quantiles() on the matrix, and then put that back into AllArrays. Best, Jim Ann Hess wrote: > After creating an appropriate library using the makePDpackage, I am > using the oligo package to open and work with Affymetrix Arabidopsis > Tiling 1.0R Arrays. I am interested in using the rma function to > background correct and normalize the data, but I am not sure how to map > the processed data back to probes or directly to chromosome and position. > > What do the rownames of the expression matrix created by rma correspond > to? My best guess is that they correspond to chromosome position (which > can be found using pmChr, but not for an ExpressionSet object). > However, these positions are relative to a particular chromosome and > therefore not unique. For example, there are probes corresponding to > position 417 on both Chromosome 3 and Chromosome 5, but only a single > row in the ExpressionSet object corresponding to 417. > > Is there a way to background correct and normalize the data without the > rma function? Perhaps this would allow for easier mapping to probes. > > Any suggestions would be appreciated. > > Ann > > Code and session info is here: > >> library(oligo) >> library(pd.at35b.mr.v04.2.tigrv5) >> AllArrays<-read.celfiles(list.celfiles(),pk="pd.at35b.mr.v04.2.tigr v5") >> dim(pm(AllArrays)) > [1] 3092374 12 >> dim(mm(AllArrays)) > [1] 3092338 12 > >> Pos<-pmPosition(AllArrays) >> length(Pos) > [1] 3092374 >> length(unique(Pos)) > [1] 2921991 > >> RMAout<-rma(AllArrays) > >> dim(exprs(RMAout)) > [1] 2921991 12 > >> exprs(RMAout)[1:10,1:2] > Comp5-1_1006.CEL Comp5-2_1006.CEL > 0 3.344400 3.295634 > 1 1.988137 1.708682 > 1000 6.315857 7.297425 > 10000009 9.053133 8.754469 > 10000014 2.106050 2.137780 > 10000024 10.392988 9.385502 > 10000026 2.242264 5.487639 > 10000034 1.830658 5.239400 > 1000004 3.097441 5.825040 > 10000046 6.839724 7.221181 > >> sessionInfo() > R version 2.6.0 (2007-10-03) > x86_64-redhat-linux-gnu > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] pd.at35b.mr.v04.2.tigrv5_1.2.0 oligo_1.2.2 > [3] oligoClasses_1.0.3 affxparser_1.10.2 > [5] AnnotationDbi_1.0.6 preprocessCore_1.0.0 > [7] RSQLite_0.6-9 DBI_0.2-4 > [9] Biobase_1.16.3 > > loaded via a namespace (and not attached): > [1] rcompgen_0.1-17 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
Actually, you can avoid "copying from affy" altogether by using preprocessCore directly where both normalize.quantiles() and rma.background.correct() are actually defined. I think oligo also loads preprocessCore, so those functions should already be exposed. Ben On Mon, 2008-07-21 at 10:48 -0400, James W. MacDonald wrote: > Hi Ann, > > I don't think you want to use rma() directly, as it is going to try to > do a medianpolish on probesets but such a thing doesn't exist for the > tiling arrays. > > If you want to use the background correction and normalization that are > used by rma() then I think it will take some work on your part. The > functions you will want to use are part of the affy package, but you > don't really want to load affy and oligo at the same time because there > are so many identically named functions (they both have namespaces, so > this isn't the end of the world, but it is easier if you don't have to > deal with name collisions). > > I would personally just copy the functions normalize.quantiles() and > rma.background.correct() from affy into a file (say, affysources.R) and > then source that into R. Both of these functions want you to pass a > matrix, so you would want to extract the pm data from your AllArrays > object, run rma.background.correct() and then normalize.quantiles() on > the matrix, and then put that back into AllArrays. > > Best, > > Jim > > > > Ann Hess wrote: > > After creating an appropriate library using the makePDpackage, I am > > using the oligo package to open and work with Affymetrix Arabidopsis > > Tiling 1.0R Arrays. I am interested in using the rma function to > > background correct and normalize the data, but I am not sure how to map > > the processed data back to probes or directly to chromosome and position. > > > > What do the rownames of the expression matrix created by rma correspond > > to? My best guess is that they correspond to chromosome position (which > > can be found using pmChr, but not for an ExpressionSet object). > > However, these positions are relative to a particular chromosome and > > therefore not unique. For example, there are probes corresponding to > > position 417 on both Chromosome 3 and Chromosome 5, but only a single > > row in the ExpressionSet object corresponding to 417. > > > > Is there a way to background correct and normalize the data without the > > rma function? Perhaps this would allow for easier mapping to probes. > > > > Any suggestions would be appreciated. > > > > Ann > > > > Code and session info is here: > > > >> library(oligo) > >> library(pd.at35b.mr.v04.2.tigrv5) > >> AllArrays<-read.celfiles(list.celfiles(),pk="pd.at35b.mr.v04.2.ti grv5") > >> dim(pm(AllArrays)) > > [1] 3092374 12 > >> dim(mm(AllArrays)) > > [1] 3092338 12 > > > >> Pos<-pmPosition(AllArrays) > >> length(Pos) > > [1] 3092374 > >> length(unique(Pos)) > > [1] 2921991 > > > >> RMAout<-rma(AllArrays) > > > >> dim(exprs(RMAout)) > > [1] 2921991 12 > > > >> exprs(RMAout)[1:10,1:2] > > Comp5-1_1006.CEL Comp5-2_1006.CEL > > 0 3.344400 3.295634 > > 1 1.988137 1.708682 > > 1000 6.315857 7.297425 > > 10000009 9.053133 8.754469 > > 10000014 2.106050 2.137780 > > 10000024 10.392988 9.385502 > > 10000026 2.242264 5.487639 > > 10000034 1.830658 5.239400 > > 1000004 3.097441 5.825040 > > 10000046 6.839724 7.221181 > > > >> sessionInfo() > > R version 2.6.0 (2007-10-03) > > x86_64-redhat-linux-gnu > > > > attached base packages: > > [1] splines tools stats graphics grDevices utils datasets > > [8] methods base > > > > other attached packages: > > [1] pd.at35b.mr.v04.2.tigrv5_1.2.0 oligo_1.2.2 > > [3] oligoClasses_1.0.3 affxparser_1.10.2 > > [5] AnnotationDbi_1.0.6 preprocessCore_1.0.0 > > [7] RSQLite_0.6-9 DBI_0.2-4 > > [9] Biobase_1.16.3 > > > > loaded via a namespace (and not attached): > > [1] rcompgen_0.1-17 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks Ben, I forgot that you moved everything into preprocessCore. Best, Jim Ben Bolstad wrote: > Actually, you can avoid "copying from affy" altogether by using > preprocessCore directly where both normalize.quantiles() and > rma.background.correct() are actually defined. > > I think oligo also loads preprocessCore, so those functions should > already be exposed. > > Ben > > > > > > On Mon, 2008-07-21 at 10:48 -0400, James W. MacDonald wrote: >> Hi Ann, >> >> I don't think you want to use rma() directly, as it is going to try to >> do a medianpolish on probesets but such a thing doesn't exist for the >> tiling arrays. >> >> If you want to use the background correction and normalization that are >> used by rma() then I think it will take some work on your part. The >> functions you will want to use are part of the affy package, but you >> don't really want to load affy and oligo at the same time because there >> are so many identically named functions (they both have namespaces, so >> this isn't the end of the world, but it is easier if you don't have to >> deal with name collisions). >> >> I would personally just copy the functions normalize.quantiles() and >> rma.background.correct() from affy into a file (say, affysources.R) and >> then source that into R. Both of these functions want you to pass a >> matrix, so you would want to extract the pm data from your AllArrays >> object, run rma.background.correct() and then normalize.quantiles() on >> the matrix, and then put that back into AllArrays. >> >> Best, >> >> Jim >> >> >> >> Ann Hess wrote: >>> After creating an appropriate library using the makePDpackage, I am >>> using the oligo package to open and work with Affymetrix Arabidopsis >>> Tiling 1.0R Arrays. I am interested in using the rma function to >>> background correct and normalize the data, but I am not sure how to map >>> the processed data back to probes or directly to chromosome and position. >>> >>> What do the rownames of the expression matrix created by rma correspond >>> to? My best guess is that they correspond to chromosome position (which >>> can be found using pmChr, but not for an ExpressionSet object). >>> However, these positions are relative to a particular chromosome and >>> therefore not unique. For example, there are probes corresponding to >>> position 417 on both Chromosome 3 and Chromosome 5, but only a single >>> row in the ExpressionSet object corresponding to 417. >>> >>> Is there a way to background correct and normalize the data without the >>> rma function? Perhaps this would allow for easier mapping to probes. >>> >>> Any suggestions would be appreciated. >>> >>> Ann >>> >>> Code and session info is here: >>> >>>> library(oligo) >>>> library(pd.at35b.mr.v04.2.tigrv5) >>>> AllArrays<-read.celfiles(list.celfiles(),pk="pd.at35b.mr.v04.2.ti grv5") >>>> dim(pm(AllArrays)) >>> [1] 3092374 12 >>>> dim(mm(AllArrays)) >>> [1] 3092338 12 >>> >>>> Pos<-pmPosition(AllArrays) >>>> length(Pos) >>> [1] 3092374 >>>> length(unique(Pos)) >>> [1] 2921991 >>> >>>> RMAout<-rma(AllArrays) >>>> dim(exprs(RMAout)) >>> [1] 2921991 12 >>> >>>> exprs(RMAout)[1:10,1:2] >>> Comp5-1_1006.CEL Comp5-2_1006.CEL >>> 0 3.344400 3.295634 >>> 1 1.988137 1.708682 >>> 1000 6.315857 7.297425 >>> 10000009 9.053133 8.754469 >>> 10000014 2.106050 2.137780 >>> 10000024 10.392988 9.385502 >>> 10000026 2.242264 5.487639 >>> 10000034 1.830658 5.239400 >>> 1000004 3.097441 5.825040 >>> 10000046 6.839724 7.221181 >>> >>>> sessionInfo() >>> R version 2.6.0 (2007-10-03) >>> x86_64-redhat-linux-gnu >>> >>> attached base packages: >>> [1] splines tools stats graphics grDevices utils datasets >>> [8] methods base >>> >>> other attached packages: >>> [1] pd.at35b.mr.v04.2.tigrv5_1.2.0 oligo_1.2.2 >>> [3] oligoClasses_1.0.3 affxparser_1.10.2 >>> [5] AnnotationDbi_1.0.6 preprocessCore_1.0.0 >>> [7] RSQLite_0.6-9 DBI_0.2-4 >>> [9] Biobase_1.16.3 >>> >>> loaded via a namespace (and not attached): >>> [1] rcompgen_0.1-17 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Hildebrandt Lab >> 8220D MSRB III >> 1150 W. Medical Center Drive >> Ann Arbor MI 48109-0646 >> 734-936-8662 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD REPLY
0
Entering edit mode
Hi, I have a question that's slightly off-topic, but I don't think it's enough to change the subject. I wanted to give a shot at using the rma.background.correction on the drosophila tiling array that I'm working with. As a disclaimer, I haven't been using all of the BioC tools to their fullest and have been doing some stuff on my own. My question is related to the following .. James wrote: >>> I would personally just copy the functions normalize.quantiles() >>> and rma.background.correct() from affy into a file (say, >>> affysources.R) and then source that into R. Both of these >>> functions want you to pass a matrix, so you would want to extract >>> the pm data from your AllArrays object, run >>> rma.background.correct() and then normalize.quantiles() on the >>> matrix, and then put that back into AllArrays. You seem to suggest that the rma.background.correction works on only the perfect match (PM) probes. I wanted to try to extract these probes from my data. In doing so, something is mystifying me. I've reblasted the probes on my array to the latest drosophila genome so that I could have better annotation for my data and interpretation. If I consider only probes on the array with >= 1 perfect match to the genome: * Some of the probes affy annotates as PM don't have any such perfect match; and * Some of the probes affy annotates as MM *do* have >= 1 perfect match. Is this expected? Assuming my code that reblasts, parses the results and anottates my probes is bug free, (I'll admit this is a possibility, but I think I've tested it well enough) what does it mean for a PM probe not to have any hit to the genome, and MM probes to have some? I expected this not to be the case, but I'm not understanding with the whole use of PM/MM probes. Should I just consider any probe that has >= 1 perfect alignment to the genome as a PM? For what it's worth, I'm not planning on using the MM probes in my signal normalization technique. Thanks for any help, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Cornell Medical College http://cbio.mskcc.org/~lianos
ADD REPLY

Login before adding your answer.

Traffic: 579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6