Dear all,
I am attempting to retrieve transcript biotypes for ncRNAs using Bioconductors's biomaRt in GRCh37 as follows:
library(biomaRt)
ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", dataset="hsapiens_gene_ensembl")
# biotypes for mRNAs are obtained fine
refseqids_nm = c("NM_152486","NM_080605", "NM_031921")
getBM(attributes=c("refseq_mrna", "transcript_biotype"), filters="refseq_mrna", values=refseqids_nm, mart=ensembl)
# refseq_mrna transcript_biotype
#1 NM_031921 protein_coding
#2 NM_080605 protein_coding
#3 NM_152486 protein_coding
# However not for ncRNAs
refseqids_nr = c("NR_015434", "NR_036637")
getBM(attributes=c("refseq_ncrna", "transcript_biotype"), filters="refseq_ncrna", values=refseqids_nr, mart=ensembl)
#[1] refseq_ncrna transcript_biotype
#<0 rows> (or 0-length row.names)
When I try the same as above but with the current release of Ensembl:
ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")
getBM(attributes=c("refseq_ncrna", "transcript_biotype"), filters="refseq_ncrna", values=refseqids_nr, mart=ensembl)
# refseq_ncrna transcript_biotype
#1 NR_015434 antisense
#2 NR_036637 processed_transcript
Then I get biotypes for ncRNAs just fine.
Perhaps there is something I am missing here. Does GRCh37 have annotations for ncRNAs? If so, any input on how I can obtain transcript biotypes using biomaRt as above?
Thanks,
Sergio
sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.10.4 biomaRt_2.30.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.9 IRanges_2.8.1 XML_3.98-1.5 digest_0.6.12 bitops_1.0-6 DBI_0.6-1 stats4_3.3.2 RSQLite_1.1-2 S4Vectors_0.12.1 tools_3.3.2 Biobase_2.34.0 RCurl_1.95-4.8 parallel_3.3.2 BiocGenerics_0.20.0 [15] AnnotationDbi_1.36.2 memoise_1.1.0