I am trying to annotate a table of genes with transcript type (TXTYPE) using Homo.sapiens annotation package within Annotation.Dbi . I want to do this so I can select for only rotein coding genes and remove all the RNA encoding and psuedogenes
I m using the following code:
all_genes$TxType <- mapIds(Homo.sapiens,
keys=row.names(all_genes),
column="TXTYPE",
keytype="SYMBOL",
multiVals="first").
However in the Tx.type column is just full of n/a's. The code works with GeneName as column.
Is the problem that only transcripts rather than genes can be annotated with TXTYPE information?
sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252
[4] LC_NUMERIC=C LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 Homo.sapiens_1.3.1
[3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.2.3
[5] GO.db_3.2.2 RSQLite_1.0.0
[7] DBI_0.3.1 OrganismDbi_1.12.0
[9] GenomicFeatures_1.22.2 GenomicRanges_1.22.0
[11] GenomeInfoDb_1.6.0 AnnotationDbi_1.32.0
[13] IRanges_2.4.0 S4Vectors_0.8.0
[15] Biobase_2.30.0 BiocGenerics_0.16.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 graph_1.48.0 magrittr_1.5
[4] XVector_0.10.0 zlibbioc_1.16.0 GenomicAlignments_1.6.1
[7] BiocParallel_1.4.0 R6_2.1.1 tools_3.2.2
[10] SummarizedExperiment_1.0.0 lambda.r_1.1.7 futile.logger_1.4.1
[13] lazyeval_0.1.10 assertthat_0.1 RBGL_1.46.0
[16] rtracklayer_1.30.1 futile.options_1.0.0 bitops_1.0-6
[19] RCurl_1.95-4.7 biomaRt_2.26.0 BiocInstaller_1.20.0
[22] Biostrings_2.38.0 Rsamtools_1.22.0 XML_3.98-1.3
But do note that you CAN get these data via either transcripts() or transcriptsBy():
Hi Jim, sjmonkley,
Looks like a bug in the SQL generated by
select()
. This is good timing because I actually started to revisit/refactor all the SQL generation in GenomicFeatures a couple of weeks ago, with some interesting speed improvements. All the extractors (exceptselect()
) now use the new SQL generator. I was going to doselect()
next but got distracted by other things. Moving it back close to the top of my TODO list.H.
Edit: After closer examination, the problem seems to be in the
"select"
method for OrganismDb objects (Homo.sapiens), and not in the method for TxDb objects as I thought initially. This one seems to work as expected:So it's not clear that my current work on SQL generation in GenomicFeatures will help address this.
Hi Hervé,
It looks like the problem comes up in OrganismDbi, specifically
OrganismDbi:::.getSelects()
, which makes the assumption (probably warranted) that the underlying data sources are the same for all packages wrapped up under Homo.sapiens. In other words, inOrganismDbi:::.getSelects()
, the TxDb is inspected for the correct keytype to be used, but it isn't inspected to ensure that the 'fromKeys' match up with the type of keys in 'toKey'. And thenselect()
is run with skipValidKeysTest = TRUE, so no error is generated (e.g., Entrez Gene Ids are passed into select(), using GENEID as the keytype, but for the TxDb.Hsapiens.BioMart.ensembl.GRCh38.p3 package, the GENEIDs are Ensembl Gene IDs, not Entrez Gene IDs).It may well be that mixing and matching annotation sources under an OrganismDbi object is a non-starter, and if the OP wants to use a Homo.sapiens package to do this sort of thing, then it should be constructed entirely of Ensembl based objects. Or, probably easier, just use the EnsDb objects instead.
Jim
Thanks Jim for taking the time to dig into this. That's helpful. The ability for the user to plug his/her own TxDb into an OrganismDb object is a feature that Marc had on his list and he actually started to do some work on this a few months ago. Don't know how much progress was made but it's definitely a useful feature and something we want to pursue. Some compatibility checks would be needed so the user can't plug anything with anything, or at least s/he gets a warning when s/he tries to put together Mouse, Pig, and Human in his/her OrganismDb object for Frankenstein. But plugging a TxDb from Ensembl into
Homo.sapiens
should be supported.H.
The above doesn't work for me. That is,
transcripts(TxDb.Hsapiens.UCSC.hg38.knownGene, columns="TXTYPE")
just yields a column of
<h2>Package Info</h2>NA
s.Please don't just add a comment to a five year old post. If you have a question, please make a new post.