Question

gene classification problem

0

Entering edit mode

Kimpel, Mark W ▴ 890

@kimpel-mark-w-727

Last seen 10.5 years ago

My apologies to those with far more statistical expertise than I, but I have what may (or may not) be a straightforward question. After performing SAM analysis of an experiment comparing two strains of rats, I have a list of about 200 significant affy rat probesets (genes) that I have mapped to their chromosomal locations. Some of the genes appear to cluster into discrete physical chromosomal regions, which I suspect is related to underlying genetic differences between the two inbred strains. Based on their chromosomal location, I have clustered these significant genes into discrete bins. Something thing to remember when solving this problem is that the distribution along chromosomes of all affy rat probesets is not uniform. Thus my fear that some of the granularity of the chromosomal locations of significant genes could not only be due to chance, but to granularity of the underlying distribution. At this point I would like to test: 1. if the distribution of sig. genes amongst the bins is statistically different from that of the population of all affy genes from which they were drawn. 2. if the above distribution of sig genes is, as I suspect different, which of the bins are responsible for this significant difference. It would be great to assign significance p values to the significance of each bin. I believe this is similar to the problem faced in analyzing the distribution of genes in GO categories but I am not familiar with the proper solution. Any sample code would be greatly appreciated. For an example, assume that I have two matrices, each of two columns with genes represented by rows. The first column is the probeset ID, the second column the "bin" that it falls into. One matrix is of all rat affy genes, the second on is only the significant genes. Thanks, Mark W. Kimpel MD Department of Psychiatry Indiana University School of Medicine Biotechnology, Research, & Training Center 1345 W. 16th Street Indianapolis, IN 46202 ?

GO affy ASSIGN GO affy ASSIGN • 1.1k views

ADD COMMENT • link updated 20.2 years ago by Charles Berry ▴ 290 • written 20.2 years ago by Kimpel, Mark W ▴ 890

score 0 · Answer 1 · 2004-12-09

Mark, In Borevitz, J.O., Liang, D., Plouffe, D., Chang, H., Zhu, T., Weigel, D., Berry, C.C., Winzeler, E., and Chory. J. (2003) Large Scale Identification of Single Feature Polymorphisms in Complex Genomes Genome Research 13,513-523. we used individual probesets on Affy arrays to search for polymorphisms among inbred strains (hyb'ing genomic DNA rather than RNA). A collection of the tools we used to identify probesets and/or regions that differentially bind according to strain may be found at: http://naturalvariation.org/sfp and the 'Methods' link will connect you to some newer work and scripts. ---------- Although you seem to have somewhat different objectives, it looks like similar statistical tools would apply to your situation. Chuck On Thu, 9 Dec 2004, Kimpel, Mark W wrote: > My apologies to those with far more statistical expertise than I, but I have what may (or may not) be a straightforward question. > > After performing SAM analysis of an experiment comparing two strains of > rats, I have a list of about 200 significant affy rat probesets (genes) > that I have mapped to their chromosomal locations. Some of the genes > appear to cluster into discrete physical chromosomal regions, which I > suspect is related to underlying genetic differences between the two > inbred strains. Based on their chromosomal location, I have clustered > these significant genes into discrete bins. Something thing to remember > when solving this problem is that the distribution along chromosomes of > all affy rat probesets is not uniform. Thus my fear that some of the > granularity of the chromosomal locations of significant genes could not > only be due to chance, but to granularity of the underlying > distribution. > > At this point I would like to test: > > 1. if the distribution of sig. genes amongst the bins is statistically different from that of the population of all affy genes from which they were drawn. > 2. if the above distribution of sig genes is, as I suspect different, which of the bins are responsible for this significant difference. It would be great to assign significance p values to the significance of each bin. > > I believe this is similar to the problem faced in analyzing the distribution of genes in GO categories but I am not familiar with the proper solution. > > Any sample code would be greatly appreciated. For an example, assume that I have two matrices, each of two columns with genes represented by rows. The first column is the probeset ID, the second column the "bin" that it falls into. One matrix is of all rat affy genes, the second on is only the significant genes. > > Thanks, > > Mark W. Kimpel MD > > Department of Psychiatry > Indiana University School of Medicine > Biotechnology, Research, & Training Center > 1345 W. 16th Street > Indianapolis, IN 46202 > > > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717