Entering edit mode
Alexander C Cambon
▴
30
@alexander-c-cambon-2336
Last seen 10.3 years ago
I apologize if this is not the right forum to post this question - I
am asking about annotation for a particular kind of Affymetrix array.
I have all the files from their site. The problem is not too little
information, it is too much information.
The array platform is Affymetrix Array "Rat Gene 1.0 ST". I am sure
this is similar to Mouse Gene ST etc which has been discussed on this
list
Using instructions form James McDonald posted on this "gmane" website,
http://article.gmane.org/gmane.science.biology.informatics.conductor/1
8963/match=oligo+s , I was able to process the cel files for this
array. The instructions worked just fine, and I was able to do all the
analysis in oligo that I am usually able to do using affy - including
quality plots, normalization, differential expression, etc.
I am finding the annotation to be a bit of a challenge though. After
using rma in oligo, the ExpressionSet object has the probe ID's along
with the expression values. Also there is a file "RaGene-1_0-st-
v1.na26.rn4.transcript.csv" from the Affymetrix web site that can be
downloaded along with appropriate readme files. This csv file maps the
probe id's to "gene_assignment", "mrna_assignment", etc. So far so
good ...
The challenge is that both the gene assignment and the mrna assignment
columns often have multiple genes and multiple mrna's or ensemble ID's
. The readme file ("RaGene-1_0-st-v1.na26.AFFX_README.NetAffx-CSV-
Files.txt"), also from the affy website, describes what is in these
columns . For example the columns contain "assignment scores" and
"coverage scores" "between a public mRNA and a transcript cluster".
the higher the scores, the better the probe (or transcript cluster)
matches to the mRNA, or visa versa.
My challenge is, how do I condense this annotation down in an
efficient manner for the principle investigator? I was thinking of
just taking the first transcript assignment from the
"gene_assignment" and "mrna_assignment" columns, but not sure this is
the right thing to do. I suppose I could somehow take the assignments
with the highest scores , but I think someone may know a better and
faster way.
I did try using the "annaffy" package (for example , the function
"aafGenBank"), but the ragene10st.db package cannot be found on the
bioconductor website (I do see the mogene10st.db package on there
though). I was also going to try exonmap (even though this is not an
exon array), but have had trouble loading the package so far
Has anyone run into this annotation problem for these types of arrays?
Any suggestions on how to come up with reasonable annotation for each
probe id?
I am using R 2.8.0 and the latest release of Bioconductor (2.3) on a
Windows XP machine.
Thanks,
Alex Cambon
Biostatistician
Department of Bioinformatics and Biostatistics
School of Public Health and Information Sciences
University of Louisville, Louisville, KY 40292
502-852-4111