Error running makeTranscriptDbFromGFF in GenomicFeatures
1
0
Entering edit mode
Jon Bråte ▴ 260
@jon-brate-6263
Last seen 6 months ago
Norway
Hi list, I am trying to create a TranscriptDb using GenomicFeatures, but I get an error message. I think there might be something wrong with my gff- file, but I am not sure. I also tried converting the gff-file to gtf, but also get an error. My goal with this is to plot the number of exons per gene. Code: #GFF-file > txdb = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter/RNA- project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gff3", + format = "gff") extracting transcript information Extracting gene IDs extracting transcript information Processing splicing information for gff3 file. Deducing exon rank from relative coordinates provided Warning message: In .deduceExonRankings(exs, format = "gff") : Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName Error in unlist(mapply(.assignRankings, starts, strands)) : error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (starts, strands) : Exon rank inference cannot accomodate trans-splicing. #GTF-file > txdbGTF = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter /RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gtf", + format = "gtf") Error in .parse_attrCol(attrCol, file, colnames) : Some attributes do not conform to 'tag value' format > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 BiocParallel_0.6.1 [5] Biostrings_2.32.1 DBI_0.2-7 GenomicAlignments_1.0.5 RCurl_1.95-4.3 [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 XML_3.98-1.1 [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 fail_1.2 [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 sendmailR_1.1-2 [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 ---------------------------------------------------------------- Jon Br?te Section for Genetics and Evolutionary Biology (EVOGENE) Department of Biosciences University of Oslo P.B. 1066 Blindern N-0316, Norway Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> Phone: 922 44 582 Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http: mn.uio.="" no="" ibv="" english="" people="" aca="" jonbra="" index.html=""> [[alternative HTML version deleted]]
Genetics TranscriptDb GenomicFeatures • 2.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
I think the error messages are a pretty good clue to what's wrong here. The TxDb needs to know the "rank" (the order within the transcript) of each exon. It tries to infer this from the positions, but this obviously fails when exons within the same transcript fall on multiple chromosomes (trans-splicing). When parsing the GTF, there is some problem with the format. You could figure out the offending line(s) by cutting the file in half recursively until the error goes away. If you want, you could put the files up on dropbox, and I'll take a look at them. Michael On Thu, Sep 4, 2014 at 3:23 AM, Jon Br?te <jon.brate at="" ibv.uio.no=""> wrote: > Hi list, > > I am trying to create a TranscriptDb using GenomicFeatures, but I get an > error message. I think there might be something wrong with my gff- file, but > I am not sure. I also tried converting the gff-file to gtf, but also get an > error. > > My goal with this is to plot the number of exons per gene. > > Code: > > #GFF-file > > txdb = makeTranscriptDbFromGFF(file = > "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gff3", > + format = "gff") > extracting transcript information > Extracting gene IDs > extracting transcript information > Processing splicing information for gff3 file. > Deducing exon rank from relative coordinates provided > Warning message: > In .deduceExonRankings(exs, format = "gff") : > Infering Exon Rankings. If this is not what you expected, then please > be sure that you have provided a valid attribute for exonRankAttributeName > Error in unlist(mapply(.assignRankings, starts, strands)) : > error in evaluating the argument 'x' in selecting a method for function > 'unlist': Error in (function (starts, strands) : > Exon rank inference cannot accomodate trans-splicing. > > #GTF-file > > txdbGTF = makeTranscriptDbFromGFF(file = > "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gtf", > + format = "gtf") > Error in .parse_attrCol(attrCol, file, colnames) : > Some attributes do not conform to 'tag value' format > > > > sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 > GenomicRanges_1.16.3 > [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 > BiocParallel_0.6.1 > [5] Biostrings_2.32.1 DBI_0.2-7 > GenomicAlignments_1.0.5 RCurl_1.95-4.3 > [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 > XML_3.98-1.1 > [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 > brew_1.0-6 > [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 > fail_1.2 > [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 > sendmailR_1.1-2 > [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 > zlibbioc_1.10.0 > > > ---------------------------------------------------------------- > Jon Br?te > > Section for Genetics and Evolutionary Biology (EVOGENE) > Department of Biosciences > University of Oslo > P.B. 1066 Blindern > N-0316, Norway > Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> > Phone: 922 44 582 > Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html< > http://mn.uio.no/ibv/english/people/aca/jonbra/index.html> > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks Michael, Yes you are right. Many of the transcripts come from multiple chromosomes (or scaffolds because this is a poorly assembled genome and that is probably why there is so much trans-splicing). I think removing the trans-spliced genes removes too many genes so I will try to do this in another way. Thank you, Jon On 4. sep. 2014, at 13:56, Michael Lawrence wrote: I think the error messages are a pretty good clue to what's wrong here. The TxDb needs to know the "rank" (the order within the transcript) of each exon. It tries to infer this from the positions, but this obviously fails when exons within the same transcript fall on multiple chromosomes (trans-splicing). When parsing the GTF, there is some problem with the format. You could figure out the offending line(s) by cutting the file in half recursively until the error goes away. If you want, you could put the files up on dropbox, and I'll take a look at them. Michael On Thu, Sep 4, 2014 at 3:23 AM, Jon Br?te <jon.brate at="" ibv.uio.no<mailto:jon.brate="" at="" ibv.uio.no="">> wrote: Hi list, I am trying to create a TranscriptDb using GenomicFeatures, but I get an error message. I think there might be something wrong with my gff- file, but I am not sure. I also tried converting the gff-file to gtf, but also get an error. My goal with this is to plot the number of exons per gene. Code: #GFF-file > txdb = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter/RNA- project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gff3", + format = "gff") extracting transcript information Extracting gene IDs extracting transcript information Processing splicing information for gff3 file. Deducing exon rank from relative coordinates provided Warning message: In .deduceExonRankings(exs, format = "gff") : Infering Exon Rankings. If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName Error in unlist(mapply(.assignRankings, starts, strands)) : error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (starts, strands) : Exon rank inference cannot accomodate trans-splicing. #GTF-file > txdbGTF = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter /RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and- expression-levels/cds.gb.gtf", + format = "gtf") Error in .parse_attrCol(attrCol, file, colnames) : Some attributes do not conform to 'tag value' format > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 BiocParallel_0.6.1 [5] Biostrings_2.32.1 DBI_0.2-7 GenomicAlignments_1.0.5 RCurl_1.95-4.3 [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 XML_3.98-1.1 [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 fail_1.2 [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 sendmailR_1.1-2 [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 ---------------------------------------------------------------- Jon Br?te Section for Genetics and Evolutionary Biology (EVOGENE) Department of Biosciences University of Oslo P.B. 1066 Blindern N-0316, Norway Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""><mailto:jon.brate at="" ibv.uio.no<mailto:jon.brate="" at="" ibv.uio.no="">> Phone: 922 44 582 Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http: mn.uio.="" no="" ibv="" english="" people="" aca="" jonbra="" index.html=""><http: mn.uio.no="" ibv="" engl="" ish="" people="" aca="" jonbra="" index.html=""> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------- Jon Br?te Section for Genetics and Evolutionary Biology (EVOGENE) Department of Biosciences University of Oslo P.B. 1066 Blindern N-0316, Norway Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> Phone: 922 44 582 Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http: mn.uio.="" no="" ibv="" english="" people="" aca="" jonbra="" index.html=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I would recommend calling gr <- import(gff) And then subset for the type being exon and tabulate by parent. Michael On Thu, Sep 4, 2014 at 8:14 AM, Jon Br?te <jon.brate at="" ibv.uio.no=""> wrote: > Thanks Michael, > > Yes you are right. Many of the transcripts come from multiple > chromosomes (or scaffolds because this is a poorly assembled genome and > that is probably why there is so much trans-splicing). > > I think removing the trans-spliced genes removes too many genes so I > will try to do this in another way. > > Thank you, > > Jon > > > On 4. sep. 2014, at 13:56, Michael Lawrence wrote: > > I think the error messages are a pretty good clue to what's wrong here. > The TxDb needs to know the "rank" (the order within the transcript) of each > exon. It tries to infer this from the positions, but this obviously fails > when exons within the same transcript fall on multiple chromosomes > (trans-splicing). When parsing the GTF, there is some problem with the > format. You could figure out the offending line(s) by cutting the file in > half recursively until the error goes away. > > If you want, you could put the files up on dropbox, and I'll take a look > at them. > > Michael > > > > On Thu, Sep 4, 2014 at 3:23 AM, Jon Br?te <jon.brate at="" ibv.uio.no=""> wrote: > >> Hi list, >> >> I am trying to create a TranscriptDb using GenomicFeatures, but I get an >> error message. I think there might be something wrong with my gff- file, but >> I am not sure. I also tried converting the gff-file to gtf, but also get an >> error. >> >> My goal with this is to plot the number of exons per gene. >> >> Code: >> >> #GFF-file >> > txdb = makeTranscriptDbFromGFF(file = >> "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gff3", >> + format = "gff") >> extracting transcript information >> Extracting gene IDs >> extracting transcript information >> Processing splicing information for gff3 file. >> Deducing exon rank from relative coordinates provided >> Warning message: >> In .deduceExonRankings(exs, format = "gff") : >> Infering Exon Rankings. If this is not what you expected, then please >> be sure that you have provided a valid attribute for exonRankAttributeName >> Error in unlist(mapply(.assignRankings, starts, strands)) : >> error in evaluating the argument 'x' in selecting a method for function >> 'unlist': Error in (function (starts, strands) : >> Exon rank inference cannot accomodate trans-splicing. >> >> #GTF-file >> > txdbGTF = makeTranscriptDbFromGFF(file = >> "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from- Bergen/gff-files-and-expression-levels/cds.gb.gtf", >> + format = "gtf") >> Error in .parse_attrCol(attrCol, file, colnames) : >> Some attributes do not conform to 'tag value' format >> >> >> > sessionInfo() >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 >> GenomicRanges_1.16.3 >> [5] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 >> >> loaded via a namespace (and not attached): >> [1] BBmisc_1.7 BSgenome_1.32.0 BatchJobs_1.3 >> BiocParallel_0.6.1 >> [5] Biostrings_2.32.1 DBI_0.2-7 >> GenomicAlignments_1.0.5 RCurl_1.95-4.3 >> [9] RSQLite_0.11.4 Rcpp_0.11.2 Rsamtools_1.16.1 >> XML_3.98-1.1 >> [13] XVector_0.4.0 biomaRt_2.20.0 bitops_1.0-6 >> brew_1.0-6 >> [17] checkmate_1.3 codetools_0.2-9 digest_0.6.4 >> fail_1.2 >> [21] foreach_1.4.2 iterators_1.0.7 rtracklayer_1.24.2 >> sendmailR_1.1-2 >> [25] stats4_3.1.0 stringr_0.6.2 tools_3.1.0 >> zlibbioc_1.10.0 >> >> >> ---------------------------------------------------------------- >> Jon Br?te >> >> Section for Genetics and Evolutionary Biology (EVOGENE) >> Department of Biosciences >> University of Oslo >> P.B. 1066 Blindern >> N-0316, Norway >> Email: jon.brate at ibv.uio.no<mailto:jon.brate at="" ibv.uio.no=""> >> Phone: 922 44 582 >> Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html< >> http://mn.uio.no/ibv/english/people/aca/jonbra/index.html> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > > ---------------------------------------------------------------- > Jon Br?te > > Section for Genetics and Evolutionary Biology (EVOGENE) > Department of Biosciences > University of Oslo > P.B. 1066 Blindern > N-0316, Norway > Email: jon.brate at ibv.uio.no > Phone: 922 44 582 > Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html > > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6