Hello Bioconductors,
I've made a CDF using the makePlatformDesign package for a drosophila
tiling array and am curious if anyone can point me to somewhere that I
can figure out what some columns of the CDF are referring to.
For instance, the resulting cdf file/package has two columns that I'm
not sure how to interpret.
1) "feature_ID" : This is a vector of ints. It looks like it is meant
to group the perfect match and mismatch pairs together as a "unit."
> head(cdf$feature_ID)
[1] 1 1 535 535 1781830 1781830
Constructing a table of the feature_IDs lists every element in this
vector having a frequency of 2, which just reinforces my suspicion
that it just groups perfect math/mismatch pairs.
2) "feature_set_name" : It also looks like this is performing some
grouping function, although the frequency of the elements in "feature
sets" varies from 1 to 10. Are these feature_sets just "probe
sets" (grouping a bunch of probes to a single transcript), or ... ?
I don't think I need this in my analysis, as I'm reblasting my probes
to get updated coordinates and what not, but I'm just curious as to
what's going on there. I think this info would also be helpful for
other people who are doing different types of analyses.
I apologize if this information is already available elsewhere, but I
haven't run across it.
Thanks,
-steve
> 1) feature_ID is indeed pairing PM and MM features;
>
Cool.
> 2) feature_set_name is a string containing the feature positions. If
> one feature maps to position 123 and 456, feature_set_name will be
> "123;456".
>
I see. So, just out of curiosity, how would you interpret these
numbers (and indeed just the 'position' column when its singular) when
the value of cdf$chromosome at that same index is <na>?
> I'm far from claiming that this is the best storage system, but was
> sufficient for an application we had a while ago.
>
> Anyways, I'm trying to move the creation of the PDInfo packages to
> the pdInfoBuilder package, which uses SQL db to store these tables.
>
> If you're willing to give that a try, your comments and suggestion
> are also very welcome.
>
As I come to better grips with how I'll be using the data for my
analysis, I'll be happy to provide any feedback.
Erm, wait. Are you suggesting I use the pdInfoBuilder package to work
with the tiling array? I looked at it briefly after reading somewhere
that the functionality of the makePlatformDesign package will be moved
there. The vignette for the pdInfoBuilder package requires that I
already have some CDF file in order to play along, so I figured it
wasn't for me and moved on for now (since my goal was to build the cdf
itself).
I'll be happy to help you test the PDInfo packages and provide
feedback, but it just wasn't clear to me that it did what I needed.
Thanks,
-steve