localization of mm values in affybatch exprs matrix

0

Entering edit mode

Karin Lagesen ▴ 80

@karin-lagesen-1292

Last seen 10.3 years ago

I have a custom affy chip that I read into R using ReadAffy(): > newdata = ReadAffy() > newdata AffyBatch object size of arrays=754x754 features (17777 kb) cdf=E_colia530222N (11378 affyids) number of samples=4 number of genes=11378 annotation=ecolia530222n > I now want to look at different values in this object. For instance, some pm values: > pm(newdata)[1:5,] chip1.CEL chip2.CEL chip3.CEL chip4.CEL [1,] 1855.0 2180.8 1444.0 2932.0 [2,] 2812.0 3451.0 2276.5 3406.0 [3,] 4162.3 4301.0 2996.0 5088.0 [4,] 1608.5 1758.0 1123.0 1987.0 [5,] 2290.0 3189.0 2474.5 2838.3 > I now also look at the values in the affybatch exprs matrix: > newdata at exprs[1:5,] chip1.CEL chip2.CEL chip3.CEL chip4.CEL [1,] 942.0 776.0 281 1475 [2,] 24422.0 26071.0 8914 21826 [3,] 1024.5 908.8 227 1594 [4,] 26267.0 27674.0 16199 22104 [5,] 130.0 193.0 168 145 > I also notice that the dimension of the exprs matrix is such that there is one column for each chip, and as many rows as there are pm plus mm values. Are the first half of rows the pm values, with the mm values following, or are the pm values every other row with the corresponding mm value below, or is this set up in some other way? Is there any way for me to look at a value in the exprs matrix and find out which entry in the pm/mm value list it is? TIA, Karin -- Karin Lagesen

affy affy • 900 views

ADD COMMENT • link updated 17.9 years ago by James W. MacDonald 67k • written 17.9 years ago by Karin Lagesen ▴ 80

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 33 minutes ago

United States

Hi Karin, Karin Lagesen wrote: > I have a custom affy chip that I read into R using ReadAffy(): > >>newdata = ReadAffy() >>newdata > > AffyBatch object > size of arrays=754x754 features (17777 kb) > cdf=E_colia530222N (11378 affyids) > number of samples=4 > number of genes=11378 > annotation=ecolia530222n > > > I now want to look at different values in this object. > > For instance, some pm values: > > >>pm(newdata)[1:5,] > > chip1.CEL chip2.CEL chip3.CEL chip4.CEL > [1,] 1855.0 2180.8 1444.0 2932.0 > [2,] 2812.0 3451.0 2276.5 3406.0 > [3,] 4162.3 4301.0 2996.0 5088.0 > [4,] 1608.5 1758.0 1123.0 1987.0 > [5,] 2290.0 3189.0 2474.5 2838.3 > > > I now also look at the values in the affybatch exprs matrix: > > >>newdata at exprs[1:5,] > > chip1.CEL chip2.CEL chip3.CEL chip4.CEL > [1,] 942.0 776.0 281 1475 > [2,] 24422.0 26071.0 8914 21826 > [3,] 1024.5 908.8 227 1594 > [4,] 26267.0 27674.0 16199 22104 > [5,] 130.0 193.0 168 145 > > > I also notice that the dimension of the exprs matrix is such that > there is one column for each chip, and as many rows as there are pm > plus mm values. > > Are the first half of rows the pm values, with the mm values > following, or are the pm values every other row with the corresponding > mm value below, or is this set up in some other way? Is there any way > for me to look at a value in the exprs matrix and find out which entry > in the pm/mm value list it is? The chip is read in row-wise, and the PM probes are in a given row, with the MM probes in the following row. Therefore, the data (excluding the various QC probes) will be N PM probes followed by N MM probes, where N is the row length of the chip. If you really want to work with the exprs matrix directly (why?), you can use indexProbes() to find the indices for whatever probeset you are interested in, and then subset out. Alternatively you can get the indices for the PM and MM probes and subset those out separately (which is how pm() and mm() work). You can also use pm() or mm() with an optional genenames argument to get the PM or MM probe values for a particular probeset or probesets. Best, Jim > > TIA, > > Karin -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.9 years ago James W. MacDonald 67k

0

Entering edit mode

On Jan 25, 2007, at 10:10 AM, James W. MacDonald wrote: > Hi Karin, > > Karin Lagesen wrote: >> I have a custom affy chip that I read into R using ReadAffy(): >> >>> newdata = ReadAffy() >>> newdata >> >> AffyBatch object >> size of arrays=754x754 features (17777 kb) >> cdf=E_colia530222N (11378 affyids) >> number of samples=4 >> number of genes=11378 >> annotation=ecolia530222n >> >> >> I now want to look at different values in this object. >> >> For instance, some pm values: >> >> >>> pm(newdata)[1:5,] >> >> chip1.CEL chip2.CEL chip3.CEL chip4.CEL >> [1,] 1855.0 2180.8 1444.0 2932.0 >> [2,] 2812.0 3451.0 2276.5 3406.0 >> [3,] 4162.3 4301.0 2996.0 5088.0 >> [4,] 1608.5 1758.0 1123.0 1987.0 >> [5,] 2290.0 3189.0 2474.5 2838.3 >> >> >> I now also look at the values in the affybatch exprs matrix: >> >> >>> newdata at exprs[1:5,] >> >> chip1.CEL chip2.CEL chip3.CEL chip4.CEL >> [1,] 942.0 776.0 281 1475 >> [2,] 24422.0 26071.0 8914 21826 >> [3,] 1024.5 908.8 227 1594 >> [4,] 26267.0 27674.0 16199 22104 >> [5,] 130.0 193.0 168 145 >> >> >> I also notice that the dimension of the exprs matrix is such that >> there is one column for each chip, and as many rows as there are pm >> plus mm values. >> >> Are the first half of rows the pm values, with the mm values >> following, or are the pm values every other row with the >> corresponding >> mm value below, or is this set up in some other way? Is there any way >> for me to look at a value in the exprs matrix and find out which >> entry >> in the pm/mm value list it is? > > The chip is read in row-wise, and the PM probes are in a given row, > with > the MM probes in the following row. Therefore, the data (excluding the > various QC probes) will be N PM probes followed by N MM probes, > where N > is the row length of the chip. This is not true I believe. The are no clear order of the pm and mm's. You need to get that information from somewhere else, usually from a CDF file. Karin: you will need to use the makecdfenv package to make what is called a CDF package - an R representation of the PM/MM/probeset pairs. Kasper > If you really want to work with the exprs matrix directly (why?), you > can use indexProbes() to find the indices for whatever probeset you > are > interested in, and then subset out. Alternatively you can get the > indices for the PM and MM probes and subset those out separately > (which > is how pm() and mm() work). You can also use pm() or mm() with an > optional genenames argument to get the PM or MM probe values for a > particular probeset or probesets. > > > Best, > > Jim > > >> >> TIA, >> >> Karin > > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and > should not be used for urgent or sensitive issues. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD REPLY • link 17.9 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

So I was a bit quick. It seems from Karin's post that she already has a CDF env. Jim's statement that the CEL file is ordered PM then MM is probably right for most chips, but in general you can only be sure that the PM and the MM are spatially close. In general you should use the CDF information to link the pm/mm/(x,y) position together and you cannot a priori know what coordinate corresponds to in terms of pm/mm/probeset. Kasper On Jan 26, 2007, at 2:56 PM, Kasper Daniel Hansen wrote: > > On Jan 25, 2007, at 10:10 AM, James W. MacDonald wrote: > >> Hi Karin, >> >> Karin Lagesen wrote: >>> I have a custom affy chip that I read into R using ReadAffy(): >>> >>>> newdata = ReadAffy() >>>> newdata >>> >>> AffyBatch object >>> size of arrays=754x754 features (17777 kb) >>> cdf=E_colia530222N (11378 affyids) >>> number of samples=4 >>> number of genes=11378 >>> annotation=ecolia530222n >>> >>> >>> I now want to look at different values in this object. >>> >>> For instance, some pm values: >>> >>> >>>> pm(newdata)[1:5,] >>> >>> chip1.CEL chip2.CEL chip3.CEL chip4.CEL >>> [1,] 1855.0 2180.8 1444.0 2932.0 >>> [2,] 2812.0 3451.0 2276.5 3406.0 >>> [3,] 4162.3 4301.0 2996.0 5088.0 >>> [4,] 1608.5 1758.0 1123.0 1987.0 >>> [5,] 2290.0 3189.0 2474.5 2838.3 >>> >>> >>> I now also look at the values in the affybatch exprs matrix: >>> >>> >>>> newdata at exprs[1:5,] >>> >>> chip1.CEL chip2.CEL chip3.CEL chip4.CEL >>> [1,] 942.0 776.0 281 1475 >>> [2,] 24422.0 26071.0 8914 21826 >>> [3,] 1024.5 908.8 227 1594 >>> [4,] 26267.0 27674.0 16199 22104 >>> [5,] 130.0 193.0 168 145 >>> >>> >>> I also notice that the dimension of the exprs matrix is such that >>> there is one column for each chip, and as many rows as there are pm >>> plus mm values. >>> >>> Are the first half of rows the pm values, with the mm values >>> following, or are the pm values every other row with the >>> corresponding >>> mm value below, or is this set up in some other way? Is there any >>> way >>> for me to look at a value in the exprs matrix and find out which >>> entry >>> in the pm/mm value list it is? >> >> The chip is read in row-wise, and the PM probes are in a given row, >> with >> the MM probes in the following row. Therefore, the data (excluding >> the >> various QC probes) will be N PM probes followed by N MM probes, >> where N >> is the row length of the chip. > > This is not true I believe. The are no clear order of the pm and > mm's. You need to get that information from somewhere else, usually > from a CDF file. > > Karin: you will need to use the makecdfenv package to make what is > called a CDF package - an R representation of the PM/MM/probeset > pairs. > > Kasper > > >> If you really want to work with the exprs matrix directly (why?), you >> can use indexProbes() to find the indices for whatever probeset you >> are >> interested in, and then subset out. Alternatively you can get the >> indices for the PM and MM probes and subset those out separately >> (which >> is how pm() and mm() work). You can also use pm() or mm() with an >> optional genenames argument to get the PM or MM probe values for a >> particular probeset or probesets. >> >> >> Best, >> >> Jim >> >> >>> >>> TIA, >>> >>> Karin >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Affymetrix and cDNA Microarray Core >> University of Michigan Cancer Center >> 1500 E. Medical Center Drive >> 7410 CCGC >> Ann Arbor MI 48109 >> 734-647-5623 >> >> >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and >> should not be used for urgent or sensitive issues. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/ >> gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD REPLY • link 17.9 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

As Kasper points it out using information stored in the CDF looks very much like the only safe solution, rather that relying on the observed fact a MM is beside its corresponding PM, since I do not think it is otherwise claimed to be this way by the manufacturer of the chips. Probe-level data in an AffyBatch are stored in a matrix, having one probe per row and one chip per column. The method "indexProbes" for "AffyBatch" will return you indexes in that matrix. If you what the X/Y coordinates for an index a convenient way is to use the function "indices2xy" Example: #your AffyBatch being "abatch" imm <- indexProbes(abatch, which="mm") xymm <- indices2xy(imm, abatch=abatch) Hoping this helps, Laurent > So I was a bit quick. It seems from Karin's post that she already has > a CDF env. > > Jim's statement that the CEL file is ordered PM then MM is probably > right for most chips, but in general you can only be sure that the PM > and the MM are spatially close. In general you should use the CDF > information to link the pm/mm/(x,y) position together and you cannot > a priori know what coordinate corresponds to in terms of pm/mm/probeset. > > Kasper > > On Jan 26, 2007, at 2:56 PM, Kasper Daniel Hansen wrote: > >> >> On Jan 25, 2007, at 10:10 AM, James W. MacDonald wrote: >> >>> Hi Karin, >>> >>> Karin Lagesen wrote: >>>> I have a custom affy chip that I read into R using ReadAffy(): >>>> >>>>> newdata = ReadAffy() >>>>> newdata >>>> >>>> AffyBatch object >>>> size of arrays=754x754 features (17777 kb) >>>> cdf=E_colia530222N (11378 affyids) >>>> number of samples=4 >>>> number of genes=11378 >>>> annotation=ecolia530222n >>>> >>>> >>>> I now want to look at different values in this object. >>>> >>>> For instance, some pm values: >>>> >>>> >>>>> pm(newdata)[1:5,] >>>> >>>> chip1.CEL chip2.CEL chip3.CEL chip4.CEL >>>> [1,] 1855.0 2180.8 1444.0 2932.0 >>>> [2,] 2812.0 3451.0 2276.5 3406.0 >>>> [3,] 4162.3 4301.0 2996.0 5088.0 >>>> [4,] 1608.5 1758.0 1123.0 1987.0 >>>> [5,] 2290.0 3189.0 2474.5 2838.3 >>>> >>>> >>>> I now also look at the values in the affybatch exprs matrix: >>>> >>>> >>>>> newdata at exprs[1:5,] >>>> >>>> chip1.CEL chip2.CEL chip3.CEL chip4.CEL >>>> [1,] 942.0 776.0 281 1475 >>>> [2,] 24422.0 26071.0 8914 21826 >>>> [3,] 1024.5 908.8 227 1594 >>>> [4,] 26267.0 27674.0 16199 22104 >>>> [5,] 130.0 193.0 168 145 >>>> >>>> >>>> I also notice that the dimension of the exprs matrix is such that >>>> there is one column for each chip, and as many rows as there are pm >>>> plus mm values. >>>> >>>> Are the first half of rows the pm values, with the mm values >>>> following, or are the pm values every other row with the >>>> corresponding >>>> mm value below, or is this set up in some other way? Is there any >>>> way >>>> for me to look at a value in the exprs matrix and find out which >>>> entry >>>> in the pm/mm value list it is? >>> >>> The chip is read in row-wise, and the PM probes are in a given row, >>> with >>> the MM probes in the following row. Therefore, the data (excluding >>> the >>> various QC probes) will be N PM probes followed by N MM probes, >>> where N >>> is the row length of the chip. >> >> This is not true I believe. The are no clear order of the pm and >> mm's. You need to get that information from somewhere else, usually >> from a CDF file. >> >> Karin: you will need to use the makecdfenv package to make what is >> called a CDF package - an R representation of the PM/MM/probeset >> pairs. >> >> Kasper >> >> >>> If you really want to work with the exprs matrix directly (why?), you >>> can use indexProbes() to find the indices for whatever probeset you >>> are >>> interested in, and then subset out. Alternatively you can get the >>> indices for the PM and MM probes and subset those out separately >>> (which >>> is how pm() and mm() work). You can also use pm() or mm() with an >>> optional genenames argument to get the PM or MM probe values for a >>> particular probeset or probesets. >>> >>> >>> Best, >>> >>> Jim >>> >>> >>>> >>>> TIA, >>>> >>>> Karin >>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Affymetrix and cDNA Microarray Core >>> University of Michigan Cancer Center >>> 1500 E. Medical Center Drive >>> 7410 CCGC >>> Ann Arbor MI 48109 >>> 734-647-5623 >>> >>> >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and >>> should not be used for urgent or sensitive issues. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/ >>> gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/ >> gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > !DSPAM:45ba8dff19191804284693! > > >

ADD REPLY • link 17.9 years ago lgautier@altern.org ▴ 950

Login before adding your answer.