Dear Chris,
We’ve actually released ensembl 79 last week so you are looking at our new release since your host is pointing at ensembl.org <http: ensembl.org=""/>. The Drosophila assembly was updated from BDGP5 to BDGP6 and gene set to version 6.02 (FB2014_05) (more information on the Ensembl declaration page:
http://www.ensembl.org/Drosophila_melanogaster/Info/WhatsNew?db=core#change_1793 <http: www.ensembl.org="" drosophila_melanogaster="" info="" whatsnew?db="core#change_1793">.
I had a look at your query and the result looks sound to me, the issue might be coming from the makeTranscriptDbFromBiomart package.
In the Ensembl databases we store all the genomic information on forward strand orientation. If we look at the following transcript “FBtr0290037” on the forward strand on the ensembl website:
http://www.ensembl.org/Drosophila_melanogaster/Transcript/Exons?db=core;g=FBgn0038135;r=3R:13294748-13298288;t=FBtr0290037 <http: www.ensembl.org="" drosophila_melanogaster="" transcript="" exons?db="core;g=FBgn0038135;r=3R:13294748-13298288;t=FBtr0290037"> and compare the result with Biomart, you can see that both informations are matching:
> ensembl_79 <- useMart(biomart=“ENSEMBL_MART_ENSEMBL", host="ensembl.org", path="/biomart/martservice", dataset="dmelanogaster_gene_ensembl")
> fruitfly_FBtr0290037 <- getBM(attributes=c(‘chromosome_name','exon_chrom_start','exon_chrom_end','ensembl_exon_id','5_utr_start','5_utr_end','3_utr_start','3_utr_end','cds_start','cds_end','strand','ensembl_transcript_id'),filters = 'ensembl_transcript_id', values = 'FBtr0290037', mart = ensembl_79)
> fruitfly_FBtr0290037
chromosome_name exon_chrom_start exon_chrom_end ensembl_exon_id 5_utr_start 5_utr_end 3_utr_start 3_utr_end cds_start cds_end strand ensembl_transcript_id
1 3R 13294748 13296098 FBtr0290037-E1 13294748 13294789 NA NA 1 1309 1 FBtr0290037
2 3R 13296170 13296267 FBtr0290037-E2 NA NA NA NA 1310 1407 1 FBtr0290037
3 3R 13296327 13296644 FBtr0290037-E3 NA NA NA NA 1408 1725 1 FBtr0290037
4 3R 13296709 13296827 FBtr0290037-E4 NA NA NA NA 1726 1844 1 FBtr0290037
5 3R 13296893 13296978 FBtr0290037-E5 NA NA NA NA 1845 1930 1 FBtr0290037
6 3R 13297058 13297463 FBtr0290037-E6 NA NA NA NA 1931 2336 1 FBtr0290037
7 3R 13297522 13298288 FBtr0290037-E7 NA NA 13298171 13298288 2337 2985 1 FBtr0290037
In this example, 5_utr_start match the exon_chrom_start (13294748) and 3_utr_end match the exon_chrom_end (13298288). You can also see the same information on the Exons sequence page of the Ensembl website:
http://www.ensembl.org/Drosophila_melanogaster/Transcript/Exons?db=core;g=FBgn0038135;r=3R:13294748-13298288;t=FBtr0290037 <http: www.ensembl.org="" drosophila_melanogaster="" transcript="" exons?db="core;g=FBgn0038135;r=3R:13294748-13298288;t=FBtr0290037">
Now if we look at your example:
> ensembl_79 <- useMart(biomart=“ENSEMBL_MART_ENSEMBL", host="ensembl.org", path="/biomart/martservice", dataset="dmelanogaster_gene_ensembl")
> fruitfly_FBtr0082757 <- getBM(attributes=c(‘chromosome_name','exon_chrom_start','exon_chrom_end','ensembl_exon_id','5_utr_start','5_utr_end','3_utr_start','3_utr_end','strand','ensembl_transcript_id'),filters = 'ensembl_transcript_id', values = 'FBtr0082757', mart = ensembl_79)
> fruitfly_FBtr0082757
chromosome_name exon_chrom_start exon_chrom_end ensembl_exon_id 5_utr_start 5_utr_end 3_utr_start 3_utr_end strand ensembl_transcript_id
1 3R 13415139 13415286 FBtr0082757-E1 13415139 13415286 NA NA -1 FBtr0082757
2 3R 13410791 13411615 FBtr0082757-E2 13411603 13411615 NA NA -1 FBtr0082757
3 3R 13409543 13410413 FBtr0082757-E3 NA NA 13409543 13409632 -1 FBtr0082757
Because this Transcript is on the reverse strand, information in the biomart start column will actually mean the end of the feature and the same for information in the end column:
Biomart 5_utr_end match the exon_chrom_end (13415286), this actually mean that the 5’ UTR start match the exon chromosome start (FBtr0082757-E1) as you can see on the exons sequence page of the Ensembl website:
http://www.ensembl.org/Drosophila_melanogaster/Transcript/Exons?db=core;g=FBgn0041711;r=3R:13409543-13415286;t=FBtr0082757 <http: www.ensembl.org="" drosophila_melanogaster="" transcript="" exons?db="core;g=FBgn0041711;r=3R:13409543-13415286;t=FBtr0082757">
Biomart 3_utr_start match the exon_chrom_start (13409543), this actually mean that the 3’ UTR end match the exon chromosome end (FBtr0082757-E3) as you can see on the exons sequence page of the Ensembl website:
http://www.ensembl.org/Drosophila_melanogaster/Transcript/Exons?db=core;g=FBgn0041711;r=3R:13409543-13415286;t=FBtr0082757 <http: www.ensembl.org="" drosophila_melanogaster="" transcript="" exons?db="core;g=FBgn0041711;r=3R:13409543-13415286;t=FBtr0082757">
My feeling is that the “TranscriptDbFromBiomart” package need to be updated since 3’UTR start don’t have to match the exon start, only 3’UTR end need to match the exon end.
Hope this helps,
Regards,
Thomas
> On 18 Mar 2015, at 17:57, Chris Seidel [bioc] <noreply@bioconductor.org> wrote:
>
> Activity on a post you are following on support.bioconductor.org <https: support.bioconductor.org=""/>
> User Chris Seidel <https: support.bioconductor.org="" u="" 5840=""/> wrote Question: makeTranscriptDbFromBiomart failure from Data Anomaly <https: support.bioconductor.org="" p="" 65796=""/>:
>
>
> I get an error while trying to create a TranscriptDb object using the makeTranscriptDbFromBiomart() function from the GenomicFeatures library. The error message is below (truncated to show just the first transcript). The last time I ran this code was with Ensembl74, whereas today it is failing with Ensembl78. Any tips? Is there anything I can do, or it simply a problem with data at biomaRt?
>
>
> > library(GenomicFeatures)
> > txdb <- makeTranscriptDbFromBiomart(host="ensembl.org"
> + ,biomart ="ENSEMBL_MART_ENSEMBL"
> + ,dataset = "dmelanogaster_gene_ensembl")
> Download and preprocess the 'transcripts' data frame ... OK
> Download and preprocess the 'splicings' data frame ... Error in .stopWithBioMartDataAnomalyReport(bm_table, idx, id_prefix, msg) :
> BioMart data anomaly: in the following transcripts,
> located on the minus strand, the 3' UTRs don't start
> where their corresponding exon starts.
> (Showing only the first 6 out of 15131 transcripts.)
> 1. Transcript FBtr0082757:
> strand rank exon_chrom_start exon_chrom_end ensembl_exon_id 5_utr_start
> 1 -1 1 13415139 13415286 FBtr0082757-E1 13415139
> 2 -1 2 13410791 13411615 FBtr0082757-E2 13411603
> 3 -1 3 13409543 13410413 FBtr0082757-E3 NA
> 5_utr_end 3_utr_start 3_utr_end cds_length
> 1 13415286 NA NA 1593
> 2 13411615 NA NA 1593
> 3 NA 13409543 13409632 1593
>
> > sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.16.3 AnnotationDbi_1.26.1 Biobase_2.24.0
> [4] GenomicRanges_1.16.4 GenomeInfoDb_1.0.2 IRanges_1.22.10
> [7] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8
> [4] BiocParallel_0.6.1 biomaRt_2.20.0 Biostrings_2.32.1
> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0
> [10] checkmate_1.5.1 codetools_0.2-11 DBI_0.3.1
> [13] digest_0.6.6 fail_1.2 foreach_1.4.2
> [16] GenomicAlignments_1.0.6 iterators_1.0.7 Rcpp_0.11.5
> [19] RCurl_1.95-4.3 Rsamtools_1.16.1 RSQLite_1.0.0
> [22] rtracklayer_1.24.2 sendmailR_1.2-1 stats4_3.1.0
> [25] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1
> [28] XVector_0.4.0 zlibbioc_1.10.0
>
>
> You may reply via email or visit
makeTranscriptDbFromBiomart failure from Data Anomaly
>
--
Thomas Maurel
Bioinformatician - Ensembl Production Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Thanks very much. I'll update. And next time I'll check versions first.