frma vectors for GPL13158 (Affymetrix HT HG-U133+ PM Array Plate)?
1
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 weeks ago
Icahn School of Medicine at Mount Sinai…

We are processing a number of arrays from the GPL13158 platform, aka Affymetrix HT HG-U133+ PM Array Plate, in a machine-learning/classification context, so we wish to use fRMA for normalization. However, this platform is different enough from ordinary HG-U133 arrays, that I'm not sure if the frma vectors would still apply, and if so, how one would apply them. (Specifically, the mismatch probes are removed, and some probesets have been reduced from 11 to 9 or 10 probes.) This platform also comes in plates of 16 or 24 arrays, which could affect how one defines "batches".

So my question is: Are there frma vectors available for this platform; or, can I obtain them easily by subsetting from a related platform; or, should I just generate my own vectors from the available GEO data and/or just my own data?

frma frmatools • 2.9k views
ADD COMMENT
0
Entering edit mode

I mean no disrespect to the fRMA method, but an alternative would be to use the SCAN algorithm (http://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html), which can be applied to these arrays on a single-sample basis without having to generate a vector.

ADD REPLY
0
Entering edit mode

I tried both SCAN and fRMA on a similar dataset on a platform for which fRMA vectors were available. In that particular dataset, fRMA seemed to work better. I'll certainly try both of them out on this dataset as well, if I can.

ADD REPLY
0
Entering edit mode

Did you use the barcodes or the SCAN/fRMA corrected estimates for your classifications?  Just curious.

It seems like SCAN.UPC is ultimately a bit more flexible than the current incarnation of fRMA for completely reannotated arrays (unpublished observations of my own, which with any luck will eventually make it out there) so if there are HUGE differences I'm interested.  If the differences are minor, in my applications, the biological noise appears to dwarf them.

Anyways, in your specific case, I wonder if  http://www.bioconductor.org/packages/release/data/annotation/html/hthgu133afrmavecs.html would help, since 

1) RMA ignores MM probes last time I checked, and 

2) if there are no new probe sequences, the platform should (!) be a strict subset of the HT-HGU133A design

Have you taken a peek at that, and perhaps subsetting the included vectors for your chips?

Hope this helps.  

ADD REPLY
0
Entering edit mode

I haven't messed around with the barcoding using either SCAN or fRMA. That's another item on my list of things to try. I was just using the corrected estimates from both.

I believe that this platform is in fact a strict subset of standard HG-U133 arrays, and I was thinking of maybe subsetting the vectors from another platform. But I'll have to figure out the internal structure of that package first. And I worry that the probes have maybe been moved around on the new design, or the difference between single arrays and plates of 16 or 24 arrays might affect things somehow.

ADD REPLY
2
Entering edit mode
@matthew-mccall-4459
Last seen 5.5 years ago
United States

There are not currently fRMA vectors available for GPL13158. I just checked GEO and there are only 44 data sets (GSEs) for this platform, which is on the low end for a general fRMA implementation. 

I would strongly caution against applying fRMA vectors from one platform to another (even if the probes are the same). We saw differences in the same probe / same probeset between HGU133a and HGU133plus2. I haven't tried HT-HGU133A to HT-HGU133+PM, but I would urge caution if you go this route. 

If you decide to make your own fRMA vectors and your data set is fairly large, I would stick to building them from your own data. This would also let you explore the interesting issue you raised -- array plate as a type of batch. 

 

ADD COMMENT
0
Entering edit mode

Thanks, I guess I was right to be suspicious of copying fRMA vectors from another platform. I think we will likely be taking your advice and build fRMA vectors from our own data.

ADD REPLY
0
Entering edit mode

What would you say is a reasonable minimum number of batches for a general fRMA implementation? GEO has 1774 samples for this platform. I'm not sure how many unique combinations of GSE/tissue there are within those 44 data sets, but I happen to also have 1344 more samples of internal data in seven separate experiments (from only two tissues, though, and each of the 7 experiments focuses on a single tissue). Would that be enough to work with? Or is it more of a problem that only 44 GSEs probably doesn't have sufficient tissue variety regardless of the number of batches?

ADD REPLY
0
Entering edit mode

I usually wait until a platform has around 200 GSEs for a general fRMA implementation. Then even after we filter out poorly annotated experiments and poor quality samples, I still have enough data to form 200 tissue/experiment combinations each with 5 samples. This may be more than is really needed (see Table 1 of http://www.biomedcentral.com/1471-2105/12/369 for a more in depth look at training data for frma vectors), but for a general implementation I prefer to err on the side of caution.

ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6