Subsetting Affybatch objects by gene list.

0

Entering edit mode

Horswell, Stuart ▴ 30

@horswell-stuart-674

Last seen 10.6 years ago

Many thanks to those of you who replied to my earlier query about subsetting Affybatch objects. However, I fear I didn't explain what I wanted to do sufficently well. I have a set of 24 arrays in a single affybatch object. Ultimately I would like to perform a quantile normalization on this, using expresso or rma, for which I will need probe pair/set level data in affybatch format (and since the bg.correct and normalize functions won't display their souce code as easily as, say, expresso, I can't side-step this by altering the code). However, I want to remove all genes which are called "Absent" (in the sense of MAS5.0) across all 24 arrays before I normalize (for continuity with previous analyses performed in Excel). I use the mas5calls function to obtain a list of affy id tags which will tell me which probesets to remove, however, since expresso and rma require affybatch objects as arguments, I need to produce an affybatch object containing probe data, *not* one of the arrays which one obtains after using the exprs function. (Previously I used exprs purely to get a list of affy id's I could export to Excel). So, I guess I should phrase my question like this - how does one replace objects in the cdf and exprs slots of an affybatch object? This would enable me to use the methods kindly suggested previously (and of course >?AffyBatch only tells me how to replace pm/mm values, rather than how to remove them altogether and simply setting their values identically equal to zero will obviously detrimentally affect the quantile normalization procedure). I can obtain an array of probe level data which only contains the data I want to normalize and a list of gene id's which should be excluded from the cdf list but I can't push them into expresso! As a final note, I'm aware that I could just get the Absent list, get the (un-normalized) expression values and then write some code to normalize at expression level but I have in fact already done this and I now want to compare the results with what happens when one uses expresso, which, since "normalize" accepts and produces affybatch objects and is called before "computeExprsSet", presumably normalizes at the level of probe pairs, rather than expression level. thanks again for your time Stu

Normalization cdf probe affy Normalization cdf probe affy • 1.4k views

ADD COMMENT • link updated 21.1 years ago by Matthew Hannah ▴ 940 • written 21.1 years ago by Horswell, Stuart ▴ 30

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 10.0 years ago

United States

On Tue, Mar 16, 2004 at 11:26:43AM -0000, Horswell, Stuart wrote: > > > Many thanks to those of you who replied to my earlier query about subsetting Affybatch objects. However, I fear I didn't explain what I wanted to do sufficently well. > > I have a set of 24 arrays in a single affybatch object. Ultimately I would like to perform a quantile normalization on this, using expresso or rma, for which I will need probe pair/set level data in affybatch format (and since the bg.correct and normalize functions won't display their souce code as easily as, say, expresso, I can't side-step this by altering the code). However, I want to remove all genes which are called "Absent" (in the sense of MAS5.0) across all 24 arrays before I normalize (for continuity with previous analyses performed in Excel). Well, first, R and Bioconductor are *open source* and that means you really can get at the source code for all functions and methods. getMethods("bg.correct") seems to be pretty simple (you can find out about it by going ?getMethods). If I understand what you are trying to do, you might want to look at the matchprobes package where we do something similar (although there we combine chips by matching on probe sequence but conceptually it is not different from what you are doing). > > I use the mas5calls function to obtain a list of affy id tags which will tell me which probesets to remove, however, since expresso and rma require affybatch objects as arguments, I need to produce an affybatch object containing probe data, *not* one of the arrays which one obtains after using the exprs function. (Previously I used exprs purely to get a list of affy id's I could export to Excel). > > So, I guess I should phrase my question like this - how does one replace objects in the cdf and exprs slots of an affybatch object? This would enable me to use the methods kindly suggested previously (and of course >?AffyBatch only tells me how to replace pm/mm values, rather than how to remove them altogether and simply setting their values identically equal to zero will obviously detrimentally affect the quantile normalization procedure). I can obtain an array of probe level data which only contains the data I want to normalize and a list of gene id's which should be excluded from the cdf list but I can't push them into expresso! > I can only suggest that if you want to do reasonably sophisticated things in any language that spending some time learning how to program in it will be rewarded. A bit of time with John Chambers book on Programming with Data would explain much of what you are asking (as would sime time with some of the documents on the Developer Page, at Bioconductor under the heading Programmers Reference Library), Robert > > As a final note, I'm aware that I could just get the Absent list, get the (un-normalized) expression values and then write some code to normalize at expression level but I have in fact already done this and I now want to compare the results with what happens when one uses expresso, which, since "normalize" accepts and produces affybatch objects and is called before "computeExprsSet", presumably normalizes at the level of probe pairs, rather than expression level. > > > thanks again for your time > > Stu > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- +--------------------------------------------------------------------- ------+ | Robert Gentleman phone : (617) 632-5250 | | Associate Professor fax: (617) 632-2444 | | Department of Biostatistics office: M1B20 | | Harvard School of Public Health email: rgentlem@jimmy.harvard.edu | +--------------------------------------------------------------------- ------+

ADD COMMENT • link 21.1 years ago rgentleman ★ 5.5k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 10.6 years ago

Stu, I'm abit confused as to why you want to go to so much trouble to subset your data. RMA or any of the expresso functions can be called on the entire affybatch and then written to file. I don't see how this would be any different to the MAS5 analysis you presumably want consistency with as MAS5 scales/normalises on a whole chip basis anyway. If you import into excel and then paste in the corresponding P/A calls from the MAS5 software and a quick use of the IF and sort functions in excel perform your filtering. (especially as BioC P/A could be (very?) slightly different to MAS5). If you want to use BioC/R for more analysis you could save the result to a txt file and then just read it back into R. A better thing to look into may be whether you really want to filter based on the P/A calls as with RMA you might find that including A genes only has a very small effect on your final list of genes (if based on fold change). And thats before considering if the P/A call is useful due to the 1/3 of MM>PM. Cheers, Matt

ADD COMMENT • link 21.1 years ago Matthew Hannah ▴ 940

0

Entering edit mode

> If you import into excel and then paste in the corresponding P/A calls from the MAS5 > software and a quick use of the IF and sort functions in excel perform your filtering. > (especially as BioC P/A could be (very?) slightly different to MAS5). If you want to > use BioC/R for more analysis you could save the result to a txt file and then just read > it back into R. or better yet use the save and load commands. faster and less typing. for example x <- ReadAffy() save(x,file="x.rda",compress=TRUE) rm(x) load("x.rda") ##now x is back > > A better thing to look into may be whether you really want to filter based on the P/A > calls as with RMA you might find that including A genes only has a very small effect on > your final list of genes (if based on fold change). And thats before considering if > the P/A call is useful due to the 1/3 of MM>PM. > > Cheers, > Matt > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 21.1 years ago Rafael A. Irizarry ★ 2.3k

Login before adding your answer.