Question

Summarising Probe Sets for Agilent 4x44 Arrays

0

Entering edit mode

Samantha Jane England ▴ 10

@samantha-jane-england-4995

Last seen 10.3 years ago

Dear Bioconductor Mailing List, We are working with custom-designed Agilent 4x44 arrays in our lab and we use Agi4x44PreProcess to perform the pre-processing. I am looking for some advice about whether or not to summarise probe sets, and if we should, recommendations for methods/approaches to use. I have been looking over the GMANE archive and I know that probe-set summarisation is a common feature of Affymetrix two-colour data analysis, due to the shorter oligos being used. In contrast, given the longer length of the oligos with Agilent chips, the consensus opinion seems to be that summarising Agilent probe sets isnt necessarily a good idea. Indeed, this may explain why there doesnt appear to be an obvious R routine to summarise Agilent one-colour probe sets (but please correct me if I am wrong). We have agreed with this consensus opinion (and agree that probes can behave differently within a probe set) and have so far been running the statistics and clustering analysis where each individual probe (that passes filtering by flags) is kept in the list. A colleague suggested that we use a non R-based stats and clustering program for differential expression. In principle this works fine the problem is that it cant cope with the large data sets in which the individual probes remain. We have to break the data set up in to chunks and perform the analysis that way, which sits a little uncomfortably with me. So the conflict is do we try to summarise the probe sets to try and overcome this problem, or should we keep individual probes separate and look for alternative clustering programs that would enable us to process the intact data set? So, in summary the questions I have are these: 1. Is summarisation ever a good idea for Agilent probe sets (we have 8 probes per transcript), and if so, are their routines in R that would enable us to do this? 2. If summarisation is a bad idea for Agilent data sets would taking the median signal intensity be a better strategy? 3. Can anybody recommend a good hierarchical clustering routine in R that would be suitable for our Agi one-colour data, whether we take all individual probes or just the median signal intensity? (I thought maybe oompa or BiClust?) I would really appreciate any advice or suggestions that people can give me. Thank you all very much in anticipation of your help. With Very Best Wishes Sam England Samantha England, PhD Lewis Lab, Syracuse University Department of Biology 110 Life Sciences Complex 107 College Place Syracuse NY 13244, USA Email: sjenglan@syr.edu Tel: (1) 315 443 7253 (lab) Tel: (1) 315 443 1929 (office) [[alternative HTML version deleted]]

Clustering probe PROcess Clustering probe PROcess • 1.6k views

ADD COMMENT • link updated 13.0 years ago by Francois Pepin ▴ 80 • written 13.0 years ago by Samantha Jane England ▴ 10

score 0 · Answer 1 · 2011-12-08

Hi Sam, It depends a lot on how you have designed your custom array. One of the reason for multiple probes in the whole genome 44k arrays is that they have given different results in their test datasets. In that case, summarizing can be counterproductive. > 1. Is summarisation ever a good idea for Agilent probe sets (we have 8 probes per transcript), and if so, are their routines in R that would enable us to do this? It could be, depending on the probe design and what your goal is. One way would be to just average over them. If you have more complicated behavior between your probes, then an RMA-style summarization could work well. Without knowing what your design is and what your data looks like, it's hard to tell. I'm not aware of R routines that do this out of the box, but I haven't checked in a while and they could be easy to write. Another type of "summarization" would be to chose a representative probe per gene (e.g. geneFilter::findLargest). You'd end up throwing away 7/8 of your array, but it works well if some probes are definitely better than others. > 2. If summarisation is a bad idea for Agilent data sets would taking the median signal intensity be a better strategy? I'd consider taking the median as a form of summarization, like I suggested an average above. If all your probes show a very similar signal, then it could be a good option. > 3. Can anybody recommend a good hierarchical clustering routine in R that would be suitable for our Agi one-colour data, whether we take all individual probes or just the median signal intensity? (I thought maybe oompa or BiClust?) I'm a fan of the basic hclust routine with method='ward', but that's not saying the others aren't good. Hope this helps, Fran?ois Pepin Scientist Sequenta, Inc. 400 E. Jamie Court, Suite 301 South San Francisco, CA 94080 650 243 3929 p francois.pepin at sequentainc.com www.sequentainc.com The contents of this e-mail message and any attachments are intended solely for the addressee(s) named in this message. This communication is intended to be and to remain confidential and may be subject to applicable attorney/client and/or work product privileges. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and its attachments. Do not deliver, distribute or copy this message and/or any attachments and if you are not the intended recipient, do not disclose the contents or take any action in reliance upon the information contained in this communication or any attachments.