handle clustering and replicated probes in Agilent 4x44K : "philosiphical" question?

0

Entering edit mode

Daniela Marconi ▴ 30

@daniela-marconi-1538

Last seen 10.6 years ago

Hi everybody, I have to come back to the issue of replicates probes in the Agilent 4 x 44K. Reading for example the answer of Gordon Smith http://article.gmane.org/gmane.science.biology.informatics.conductor/1 3846/match=agilent+probe+replicates I completely agree with him to treat the replicated probes, doing the analysis to select the differentially expressed probes, as indipendent. In fact, I think that to average these probes (like in Feature Extraction software and Rosetta Resolver ) before to perform the analysis to identify differential expressed genes couldn't be a safe solution in general (for example for within-array problems). Now the question is: after have identified a set of differentially expressed probes, let's say that we want to perform a hierarchical clustering to "visualize" the differential gene expression profiling adding a third class to evaluate the similarities of this new class with the profile of the other two, what we have to do with the replicates? 1)CONSIDER THE PROBES AS INDIPENDENT ALSO WHEN WE USE THE HIERARCHICAL CLUSTER? In my opinion the implicit constrain of this approach is to introduce a "literature-bias" , because the replicated genes are those who are better known in the literature as central- players in many different process (just for example p53, ER and so on). In this way we force implicitly the algorithm to be guided by those genes, if all (or most of all) appears as differentially expressed in the list. But, in my experience, this kind of bias is however introduced by biologists or clinicians when they go through the list of differentially expressed genes, to decide on which genes they have to focus their attention (for validation and further investigation) 2) REDUCE "THE PROBES" TO JUST ONE "GENE"? In this case the problem is how? I was thinking to select the probe with the best adjusted p-value for example or at least to average only the probes that are identified as differentially expressed. The p-value in my opinion could be the best choice, but at the moment is just an opinion. Have someone faced this point? Thank you for any help, suggestion or comment.... Daniela Daniela Marconi PhD Students Physics Department University of Bologna Viale Berti Pichat 6/2 Bologna Italy office: +39 051 2095136

GO GO • 835 views

ADD COMMENT • link updated 17.2 years ago by Francois Pepin ★ 1.3k • written 17.2 years ago by Daniela Marconi ▴ 30

0

Entering edit mode

Francois Pepin ★ 1.3k

@francois-pepin-1012

Last seen 10.6 years ago

Hi Daniela, There are replicated probes (same probe id) and then there are genes that have several probes. In the first case, I would simply suggest that you choose one. We arbitrarily choose the first one because their expression is basically identical with all our probes. Averaging them would probably be a better way of doing it, but the advantage is likely quite small. In the second case, those probes generally behave similarly, but they can also give you a fairly different expression. I usually use a representative probe when doing a hierarchical clustering. I don't have any papers to back me up, but I've found most distance metrics to give too much weight to the duplicated probes when doing hierarchical clustering. If the probes come from a differential expression, choosing the best p-value is reasonable. If you are doing class discovery, then you would need to use a unbiased method, such as the variance or interquartile range. I hope this helps, Francois Daniela Marconi wrote: > Hi everybody, > I have to come back to the issue of replicates probes in the Agilent 4 x 44K. > Reading for example the answer of Gordon Smith > > http://article.gmane.org/gmane.science.biology.informatics.conductor /13846/match=agilent+probe+replicates > > I completely agree with him to treat the replicated probes, doing the > analysis to select the differentially expressed probes, as > indipendent. > In fact, I think that to average these probes (like in Feature > Extraction software and Rosetta Resolver ) before to perform the > analysis to identify differential expressed genes couldn't be a safe > solution in general (for example for within-array problems). > > Now the question is: after have identified a set of differentially > expressed probes, let's say that we want to perform a hierarchical > clustering to "visualize" the differential gene expression profiling > adding a third class to evaluate the similarities of this new class > with the profile of the other two, what we have to do with the > replicates? > > 1)CONSIDER THE PROBES AS INDIPENDENT ALSO WHEN WE USE THE HIERARCHICAL CLUSTER? > In my opinion the implicit constrain of this approach is to introduce > a "literature-bias" , because the replicated genes are those who are > better known in the literature as central- players in many different > process (just for example p53, ER and so on). In this way we force > implicitly the algorithm to be guided by those genes, if all (or most > of all) appears as differentially expressed in the list. > But, in my experience, this kind of bias is however introduced by > biologists or clinicians when they go through the list of > differentially expressed genes, to decide on which genes they have to > focus their attention (for validation and further investigation) > > 2) REDUCE "THE PROBES" TO JUST ONE "GENE"? > In this case the problem is how? I was thinking to select the probe > with the best adjusted p-value for example or at least to average only > the probes that are identified as differentially expressed. > The p-value in my opinion could be the best choice, but at the moment > is just an opinion. > > Have someone faced this point? > Thank you for any help, suggestion or comment.... > Daniela > > > Daniela Marconi > PhD Students > Physics Department > University of Bologna > Viale Berti Pichat 6/2 > Bologna > Italy > office: +39 051 2095136 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 17.2 years ago Francois Pepin ★ 1.3k

Login before adding your answer.