Entering edit mode
Hi,
I'm trying to convert the annotation of an old microarray results to the latest Entrez and Ensembl IDs.
I'm using biomaRt for that.
library(biomaRt)
library(tidyverse)
d=as.character(data[ ,1])
mart.current <- useMart(biomart="ensembl",dataset="mmusculus_gene_ensembl")
att=listAttributes(mart.current)
current.results <- getBM(attributes=c("affy_mogene_1_0_st_v1", "entrezgene_id","ensembl_gene_id"),filter="affy_mogene_1_0_st_v1",values=d, mart=mart.current)
current.results
I'm getting many NAs in both entrezgeneid and ensemblgene_id.
Is it the correct way of doing it? Is there any other way? Can we use annotationHub package? Why am I getting many NAs?
Thanks.
D.
This is the header of d : 10338002 10338008 10338011 10338016 10338018 10338019 10338024 10338027 10338028 10338033 10338034 10338040 10338046 10338051 10338054 10338055 10338061 10338162 10338214 10338244 10338336 10338396 10338446 10338654 10338760 10338890 10338969 10338979 10338988 10338994 10339070
I see. I am not sure at which level are these IDs (probe or transcript cluster IDs).
You could try either of the following commands to see if you get a better mapping
Thanks a lot. I'm still having many NAs.
1 10338002 <na> <na> 2 10338008 <na> <na> 3 10338011 <na> <na> 4 10338016 <na> <na> 5 10338018 <na> <na> 6 10338019 <na> <na> 7 10338024 <na> <na> 8 10338027 <na> <na>
Thanks a lot. I'm still having many NAs.
1 10338002 <na> <na> 2 10338008 <na> <na> 3 10338011 <na> <na> 4 10338016 <na> <na> 5 10338018 <na> <na> 6 10338019 <na> <na> 7 10338024 <na> <na> 8 10338027 <na> <na>
There's a post from a 2008 at https://stat.ethz.ch/pipermail/bioconductor/2008-September/024167.html which seems to mention most of the transcript cluster IDs you're reporting as
NA
, along with a note from Affymetrix saying:It's been a long time since I worked with Affy ST arrays, so I'd suggest reading that post yourself, but maybe you don't need to worry about missing annotation for those transcript clusters if that was Affymetrix's opinion at the time.
We can go straight to Affy for more info as well.
Thanks a lot. It's quite clear now. How to use the AffyCompatible library to get the ensembl annotation? I didn't know about this package. I tried
It didn't work.
It's a bit difficult to work with the Affy data directly, so I am not sure I would recommend doing that. The column with the mRNA targets is formatted as ID // Source // Name // Chr // Other stuff /// Repeat. So as a random example,
Where there are four RefSeq IDs, then an Ensembl ID, then a Havana transcript ID, then two GenCode IDs, and finally a UCSC transcript.
And if I pick another random one, it's different:
So you could parse that and get the Ensembl or GenCode IDs, but that sounds boring and hard and pointless when you could just use the
mogene10sttranscriptcluster.db
package instead.OR you could realize that Affy haven't even updated their data for years now, so it's unlikely whatever you will get is materially different from what you already have (I <del>make</del> made these annotation packages for every release up until a few years ago, and now they just get re-processed with a new version since they are essentially static now).
Thanks a lot.
So what's the best strategy you advice to get the latest Ensembl annotation from an old affy annotation coming from an old microarray analysis?
We want to compare those results with new RNA-seq results.
Thanks again.
I think I already answered that question. Use the
mogene10sttranscriptcluster.db
package.Thank you so much for your help.
One of our collaborators suggested to use the updated annotation from the BRAINARRAY project. Is this annotation included in the
mogene10sttranscriptcluster.db
package?If not is there a package that takes into account this annotation.
Thanks a lot.
The MBNI groups makes the annotation available for entrez gene-based remappings (only); see the columns labelled 'A' on their website with downloads. Note that the remapped probeset IDs simply consist of (in this case) the entrez ID with an '_at' suffix, If you strip these off, you can also use/query the
org.Mm.eg.db
package; I usually use this approach.If you would like to use ENSEMBL-based remappings, I recommend to use the
EnsDb
packages that can be retrieved from theAnnotationHub
. To get you started with this, see for example here and here. Note that you need to strip off the '_at' suffix from these probesets IDs as well!Just specifically on the BrainArray part: I recently re-analysed a study that used a custom BrainArray. For annotation, I downloaded the *.db package from the following location, and install locally: http://brainarray.mbni.med.umich.edu/bioc/src/contrib/
It's really informative. Thanks a lot.