Entering edit mode
Kimpel, Mark W
▴
890
@kimpel-mark-w-727
Last seen 10.2 years ago
My apologies to those with far more statistical expertise than I, but
I have what may (or may not) be a straightforward question.
After performing SAM analysis of an experiment comparing two strains
of rats, I have a list of about 200 significant affy rat probesets
(genes) that I have mapped to their chromosomal locations. Some of the
genes appear to cluster into discrete physical chromosomal regions,
which I suspect is related to underlying genetic differences between
the two inbred strains. Based on their chromosomal location, I have
clustered these significant genes into discrete bins. Something thing
to remember when solving this problem is that the distribution along
chromosomes of all affy rat probesets is not uniform. Thus my fear
that some of the granularity of the chromosomal locations of
significant genes could not only be due to chance, but to granularity
of the underlying distribution.
At this point I would like to test:
1. if the distribution of sig. genes amongst the bins is
statistically different from that of the population of all affy genes
from which they were drawn.
2. if the above distribution of sig genes is, as I suspect
different, which of the bins are responsible for this significant
difference. It would be great to assign significance p values to the
significance of each bin.
I believe this is similar to the problem faced in analyzing the
distribution of genes in GO categories but I am not familiar with the
proper solution.
Any sample code would be greatly appreciated. For an example, assume
that I have two matrices, each of two columns with genes represented
by rows. The first column is the probeset ID, the second column the
"bin" that it falls into. One matrix is of all rat affy genes, the
second on is only the significant genes.
Thanks,
Mark W. Kimpel MD
Department of Psychiatry
Indiana University School of Medicine
Biotechnology, Research, & Training Center
1345 W. 16th Street
Indianapolis, IN 46202
?