affy 2.0 (fwd)

0

Entering edit mode

Jeff Gentry ★ 3.9k

@jeff-gentry-12

Last seen 10.1 years ago

Forwarded on request of Rafael .... Given the heavy usage of affy by members of this list, it might be of interest. ---------- Forwarded message ---------- Date: Fri, 23 Aug 2002 00:07:09 -0400 (EDT) From: Rafael A. Irizarry <ririzarr@jhsph.edu> Reply-To: rafa@jhu.edu To: biocore@stat.math.ethz.ch Subject: affy 2.0 hi! for the next version of affy i would like to have just one main class. because the pkg is a merge of two, we have redundancy, there are two approaches for storing probe level data. this is extra work because we have to make sure methods work for both. regardless of the approach we decide on, we will have the same methods so the user should not see the difference. i need help deciding which approach is more convenient. ill use chips instead of arrays so that we dont get confused with what R calls arrays. approach 1: for each chip we keep a matrix (Cel) where the row 10 ,column 12 entry represents the probe intensity read from the physical row 10, column 12 position on the chip. we then keep three dimensional arrays to represent multiple chip experiments. to know what position goes with what gene a separate class (Cdf) is defined that contains a matrix with the gene names for each entry in the probe intensity matrix. so the row 10, column 12 entry in the Cdf matrix gives the genename for the probe in the row 10, column 12 entry in the Cel matrix.the Cdf class contains the necesary info to know whats PM and whats MM approach 2: keeps the pm data in a matrix with rows representing probes and columns representing chips. similarly for mm. to know what row goes with what gene we keep a vector with the genenames. to know what gene is in column, say, 10 we simply look to the 10th entry in the name vector. similarly we have vectors with the probe numbers, x positions, and y positions, an advantage of approach 1 is that we dont need to keep the x,y (physical position on the chip) information. a disadvantage is that subsetting by genes and creating "fake" instances can be confusing because we need to control 2 classes (cel,cdf). an advantage of approach 2 is that the pms and mms are readily available and subsetting by genes is easy. as a consequence creating "fake" instances is easy. a disadvantage is that we need extra slots to keep the physical position information and that the we are a bit farther away from the raw data. at first i was leaning toward approach 1 because its closer to the raw data... now im a bit worried about difficulties with subsetting by genes, and how it affects "genes for hire". any opinions? suggestions? rafael _______________________________________________ Biocore mailing list Biocore@stat.math.ethz.ch http://www.stat.math.ethz.ch/mailman/listinfo/biocore

cdf probe affy cdf probe affy • 888 views

ADD COMMENT • link 22.1 years ago Jeff Gentry ★ 3.9k

Login before adding your answer.