Error in .merge_transcript_parts(transcripts)
0
0
Entering edit mode
csijcs • 0
@csijcs-18327
Last seen 5.6 years ago

Hello,

I am trying to make a txdb from a .gff of sncRNA obtained from the DASHR database (DASHR v2.0 hg38 sncRNA annotation [GFF]). I had to do a little formatting to remove the 10th column of some lines, but once that was done I tried importing and making a txdb with makeTxDbFromGFF and receive the following error:

>TxDb <- makeTxDbFromGFF(file = "/data2/csijcs/hg38/dashr.v2.sncRNA.annotation.hg38.edited.gff", format="auto")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .merge_transcript_parts(transcripts) : 
  The following transcripts have multiple parts that cannot be merged
  because of incompatible type: U13, U3, U8

I tried removing those lines, but got even more errors:

> TxDb <- makeTxDbFromGFF(file = "/data2/csijcs/hg38/dashr.v2.sncRNA.annotation.hg38.edited.noU13U3U6U8.gff", format="auto")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .merge_transcript_parts(transcripts) : 
  The following transcripts have multiple parts that cannot be merged
  because of incompatible seqnames: 5S, LSU-rRNA_Hsa, SSU-rRNA_Hsa, U1,
  U14, U17, U2, U4, U5, U6, U7

 

Is it possible to make a TxDb for this annotation file?  I am trying to perform differential expression with DESeq.

Here is my sessionInfo:

> sessionInfo() 
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/csijcs/anaconda2/lib/R/lib/libRblas.so
LAPACK: /home/csijcs/anaconda2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.40.6                     
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [3] apeglm_1.2.1                           
 [4] tximportData_1.8.0                     
 [5] readr_1.1.1                            
 [6] tximport_1.8.0                         
 [7] RColorBrewer_1.1-2                     
 [8] ggplot2_3.1.0                          
 [9] DESeq2_1.20.0                          
[10] SummarizedExperiment_1.10.1            
[11] DelayedArray_0.6.6                     
[12] BiocParallel_1.14.2                    
[13] matrixStats_0.54.0                     
[14] GenomicFeatures_1.32.3                 
[15] AnnotationDbi_1.42.1                   
[16] Biobase_2.40.0                         
[17] GenomicRanges_1.32.7                   
[18] GenomeInfoDb_1.16.0                    
[19] IRanges_2.14.12                        
[20] S4Vectors_0.18.3                       
[21] BiocGenerics_0.26.0                    

loaded via a namespace (and not attached):
 [1] bitops_1.0-6             mirbase.db_1.2.0         bit64_0.9-7             
 [4] progress_1.2.0           httr_1.3.1               numDeriv_2016.8-1       
 [7] tools_3.5.0              backports_1.1.2          R6_2.3.0                
[10] rpart_4.1-13             Hmisc_4.1-1              DBI_1.0.0               
[13] lazyeval_0.2.1           colorspace_1.3-2         nnet_7.3-12             
[16] withr_2.1.2              tidyselect_0.2.5         gridExtra_2.3           
[19] prettyunits_1.0.2        bit_1.1-14               compiler_3.5.0          
[22] htmlTable_1.12           scales_1.0.0             checkmate_1.8.5         
[25] genefilter_1.62.0        stringr_1.3.1            digest_0.6.18           
[28] Rsamtools_1.32.3         foreign_0.8-71           XVector_0.20.0          
[31] base64enc_0.1-3          pkgconfig_2.0.2          htmltools_0.3.6         
[34] bbmle_1.0.20             htmlwidgets_1.3          rlang_0.3.0.1           
[37] rstudioapi_0.8           RSQLite_2.1.1            bindr_0.1.1             
[40] acepack_1.4.1            dplyr_0.7.8              RCurl_1.95-4.11         
[43] magrittr_1.5             GenomeInfoDbData_1.1.0   Formula_1.2-3           
[46] Matrix_1.2-15            Rcpp_1.0.0               munsell_0.5.0           
[49] stringi_1.2.4            MASS_7.3-51.1            zlibbioc_1.26.0         
[52] plyr_1.8.4               grid_3.5.0               blob_1.1.1              
[55] crayon_1.3.4             lattice_0.20-38          Biostrings_2.48.0       
[58] splines_3.5.0            annotate_1.58.0          hms_0.4.2               
[61] locfit_1.5-9.1           knitr_1.20               pillar_1.3.0            
[64] geneplotter_1.58.0       biomaRt_2.36.1           XML_3.98-1.16           
[67] glue_1.3.0               latticeExtra_0.6-28      data.table_1.11.8       
[70] BiocManager_1.30.4       gtable_0.2.0             purrr_0.2.5             
[73] assertthat_0.2.0         emdbook_1.3.10           xtable_1.8-3            
[76] coda_0.19-2              survival_2.43-1          tibble_1.4.2            
[79] GenomicAlignments_1.16.0 memoise_1.1.0            bindrcpp_0.2.2          
[82] cluster_2.0.7-1         
 

maketxdbfromgff • 1.4k views
ADD COMMENT
0
Entering edit mode

I've tested this with a modified file (10th column removed since GFF files have 9 columns), and got the same error as you. The error is thrown by the .merge_transcript_parts() function in GenomicFeatures. In essence, tt seems the reason you are getting the error is that the tx_type value generated in the function from the ID=<something> 9th column in your file contains values that are not unique (e.g. ID=U4 is not unique). To get maketxdbdbfromgff() to work, it seems that you need to make all of the 9th columns values in your gff file unique, or remove non-unique columns.
 

ADD REPLY

Login before adding your answer.

Traffic: 512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6