Question

The metadata in an Affymetrix CDF

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 22 days ago

United States

Hello Bioconductors, I've made a CDF using the makePlatformDesign package for a drosophila tiling array and am curious if anyone can point me to somewhere that I can figure out what some columns of the CDF are referring to. For instance, the resulting cdf file/package has two columns that I'm not sure how to interpret. 1) "feature_ID" : This is a vector of ints. It looks like it is meant to group the perfect match and mismatch pairs together as a "unit." > head(cdf$feature_ID) [1] 1 1 535 535 1781830 1781830 Constructing a table of the feature_IDs lists every element in this vector having a frequency of 2, which just reinforces my suspicion that it just groups perfect math/mismatch pairs. 2) "feature_set_name" : It also looks like this is performing some grouping function, although the frequency of the elements in "feature sets" varies from 1 to 10. Are these feature_sets just "probe sets" (grouping a bunch of probes to a single transcript), or ... ? I don't think I need this in my analysis, as I'm reblasting my probes to get updated coordinates and what not, but I'm just curious as to what's going on there. I think this info would also be helpful for other people who are doing different types of analyses. I apologize if this information is already available elsewhere, but I haven't run across it. Thanks, -steve

cdf makePlatformDesign cdf makePlatformDesign • 750 views

ADD COMMENT • link 16.9 years ago Steve Lianoglou ★ 13k

score 0 · Answer 1 · 2008-05-27

> 1) feature_ID is indeed pairing PM and MM features; > Cool. > 2) feature_set_name is a string containing the feature positions. If > one feature maps to position 123 and 456, feature_set_name will be > "123;456". > I see. So, just out of curiosity, how would you interpret these numbers (and indeed just the 'position' column when its singular) when the value of cdf$chromosome at that same index is <na>? > I'm far from claiming that this is the best storage system, but was > sufficient for an application we had a while ago. > > Anyways, I'm trying to move the creation of the PDInfo packages to > the pdInfoBuilder package, which uses SQL db to store these tables. > > If you're willing to give that a try, your comments and suggestion > are also very welcome. > As I come to better grips with how I'll be using the data for my analysis, I'll be happy to provide any feedback. Erm, wait. Are you suggesting I use the pdInfoBuilder package to work with the tiling array? I looked at it briefly after reading somewhere that the functionality of the makePlatformDesign package will be moved there. The vignette for the pdInfoBuilder package requires that I already have some CDF file in order to play along, so I figured it wasn't for me and moved on for now (since my goal was to build the cdf itself). I'll be happy to help you test the PDInfo packages and provide feedback, but it just wasn't clear to me that it did what I needed. Thanks, -steve