(missing?) UCSCKG -> SYMBOL mappings in Homo.sapiens (etc.)
1
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.3 years ago
United States
re: '[BioC] question about Gviz' thread fallout: Yesterday I rolled a relatively simple programmatic way to label UCSC KnownGene entries with their symbols. However, some isoforms (e.g. some for NRIP1 and CDKN2B) seem to be missing from the mappings. Investigating a bit, and referring to ?org.Hs.egUCSCKG, I find ...This mapping is based on the very latest build available at UCSC for this organism as of March 2010. 2.6 is the last release where you can expect it to be here. The GenomicFeatures package contains functionality that replaces the need for this mapping... Alas, I'm too thick to find where, in the TxDb or elsewhere, I could retrieve Hugo IDs for UCSC KnownGene entries without using org.Hs.egSYMBOL. The latter is what I usually do: library(Homo.sapiens) txs <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene) head(names(txs)) ## [1] "1" "10" "100" "1000" "10000" "100008586" names(txs) <- mget(names(txs), org.Hs.egSYMBOL, ifnotfound=NA) head(names(txs)) ## [1] "A1BG" "NAT2" "ADA" "CDH2" "AKT3" "GAGE12F" Now, I thought for a while, hell, this gets them all! But, not really... txs$NRIP1 ## GRanges with 1 range and 2 metadata columns: ## seqnames ranges strand | tx_id tx_name ## <rle> <iranges> <rle> | <integer> <character> ## [1] chr21 [16333556, 16437126] - | 71301 uc002yjx.2 Well, that's one of the isoforms. But what about the other ones? org.Hs.egUCSCKG[[ "c002yjx.1" ]] ## NULL org.Hs.egUCSCKG[[ "uc010gkz.1" ]] ## NULL I know UCSC identifiers can be a bit of a pain in the ass, but there do exist mappings for these. If they're going to be used as primary identifiers for the TxDb packages, would it be possible to update them? If it's an issue of time constraints, I will take a stab at it, but that will almost guarantee more prattling from me on the mailing list. On the other hand, it might move GAF3.0 annotations out of the station. Much obliged for any insights from the core developers. -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
Organism GenomicFeatures Organism GenomicFeatures • 1.4k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States
Hi Tim, First of all let me assure you that we have NOT abandoned UCSC known gene IDs. They have just been migrated to another field (TXNAME). The reason for the deprecation is just so that people don't rely on getting them in this location (UCSCKG). The rationale is that people should be able to get something that is in actuality a transcript ID from a transcript oriented object. In spite of the severe sounding deprecation warning, these IDs have actually been updated with every release. I have (so far) just kept updating them simply because I did not want to inconvenience anyone by making them go away too soon. My hope was that after enough time had elapsed I could quietly remove them with minimized pain. So don't panic. But please don't keep using then either. So the most important thing to know is that you should get things like UCSC known gene IDs from the TXNAME field and from a TranscriptDb, or OrganismDb. (When appropriate: since not all transcriptomes can even have known gene IDs.) So to look up a gene symbol from a knownGene name you should be trying to do it like this: library(Homo.sapiens select(Homo.sapiens, cols=c("SYMBOL","TXNAME"), keys=c("uc002yjx.2"), keytype="TXNAME") As for the other issues you are having with the specific IDs you were looking for, I have been investigating that and it appears to trace back to UCSCs genome browser (and their associated resources). I will be therefore moving this thread to the bioc-devel list for the rest of the discussion. Any interested parties can tune in over there. Marc On 02/12/2013 10:04 AM, Tim Triche, Jr. wrote: > re: '[BioC] question about Gviz' thread fallout: > > Yesterday I rolled a relatively simple programmatic way to label UCSC > KnownGene entries with their symbols. However, some isoforms (e.g. some > for NRIP1 and CDKN2B) seem to be missing from the mappings. > > Investigating a bit, and referring to ?org.Hs.egUCSCKG, I find > > ...This mapping is based on the very latest build available at UCSC > for this organism as of March 2010. 2.6 is the last release where > you can expect it to be here. The GenomicFeatures package > contains functionality that replaces the need for this mapping... > > Alas, I'm too thick to find where, in the TxDb or elsewhere, I could > retrieve Hugo IDs for UCSC KnownGene entries without using org.Hs.egSYMBOL. > The latter is what I usually do: > > library(Homo.sapiens) > > txs<- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene) > head(names(txs)) > ## [1] "1" "10" "100" "1000" "10000" > "100008586" > > names(txs)<- mget(names(txs), org.Hs.egSYMBOL, ifnotfound=NA) > head(names(txs)) > ## [1] "A1BG" "NAT2" "ADA" "CDH2" "AKT3" "GAGE12F" > > Now, I thought for a while, hell, this gets them all! But, not really... > > txs$NRIP1 > ## GRanges with 1 range and 2 metadata columns: > ## seqnames ranges strand | tx_id tx_name > ##<rle> <iranges> <rle> |<integer> <character> > ## [1] chr21 [16333556, 16437126] - | 71301 uc002yjx.2 > > Well, that's one of the isoforms. But what about the other ones? > > org.Hs.egUCSCKG[[ "c002yjx.1" ]] > ## NULL > > org.Hs.egUCSCKG[[ "uc010gkz.1" ]] > ## NULL > > I know UCSC identifiers can be a bit of a pain in the ass, but there do > exist mappings for these. If they're going to be used as primary > identifiers for the TxDb packages, would it be possible to update them? > > If it's an issue of time constraints, I will take a stab at it, but that > will almost guarantee more prattling from me on the mailing list. On the > other hand, it might move GAF3.0 annotations out of the station. > > Much obliged for any insights from the core developers. > >
ADD COMMENT

Login before adding your answer.

Traffic: 715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6