hugene10sttranscriptcluster.db missing some genes?
1
0
Entering edit mode
colonppg ▴ 30
@colonppg-7771
Last seen 7.7 years ago
United States

Dear all:

I have a project on hugene st 1.0 v1

I got all the probesets ID and give it to idsi, then try to get all entrezid and genesymbolls

idsi<-probe.gs$PROBEID
annot<-select(hugene10sttranscriptcluster.db, as.character(idsi), c("ENTREZID", "SYMBOL"), "PROBEID")

Got error message:

Warning message:
In .generateExtraRows(tab, keys, jointype) :
  'select' resulted in 1:many mapping between keys and return rows

and it is weird some of the genes is apparently missing from the result data frame....

do not understand why, anyone had the same issue?

Thanks

 

hugene10sttranscriptcluster.db annotation • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 55 minutes ago
United States

That's not an error. It's a warning. And what it says is that there are multiple one-to-many mappings of probeset IDs to either Entrez Gene IDs or Hugo Symbols.

 

> z <- select(hugene10sttranscriptcluster.db, keys(hugene10sttranscriptcluster.db), c("SYMBOL","ENTREZID"))
> head(z[duplicated(z[,1]),])
     PROBEID       SYMBOL  ENTREZID
4205 7896740        OR4F4     26682
4206 7896740        OR4F5     79501
4208 7896742 LOC100134822 100134822
4209 7896742       PCMTD2     55251
4210 7896742  LINC00266-1    140849
4211 7896742 LOC101059936 101059936

 

ADD COMMENT
0
Entering edit mode

Dear James:

Thanks for your response, I do not think the warning will be an issue, but apparently this package is buggy because it misses a lot genes -- I downloaded the annotation from Affy and processed them under unix, those genes are there...

I wonder if anyone encountered such issue and has a work around...

thanks

 

ADD REPLY
0
Entering edit mode

The package isn't buggy - it reports exactly what we get from Affy. There are caveats however.

  1. We base annotations on RefSeq/GenBank and Entrez Gene IDs. Any transcript that isn't in one of those databases is invisible to the process. This could/should/might change, but for now there it is.
  2. The current annotation packages are based on the na34 annotation files that were current when we released. Affy has since admitted to a co-worker of mine that (at least for the HuEx 1.0 annotations) there are problems with these files, and have released the na35 (and just two days ago some na35.1) versions. I am in the process of re-building the annotation packages, and hypothetically the newer versions will be better in some substantive way.
     
ADD REPLY
0
Entering edit mode

hi, James:

thanks, that's explains it:

mydata<-frma(affyobj, target="core")

I think the "core" made a lot of genes missing... thanks for your explanation.

great help...

 

ADD REPLY
0
Entering edit mode

Using 'core' shouldn't make a lot of genes missing. This simply summarizes the probesets at the transcript level. This is different from Affy's concept of core probesets for the Exon arrays, where the core probesets are those with most evidence to actually exist. The Gene arrays only have (what Affy calls) core probesets to begin with (and for those there are often individual probes dropped for various reasons), so for oligo and frma core == transcript.
 

ADD REPLY

Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6