ParseMetaFromGtfFile() from SCAN.UPC package fails to produce annotation file
1
0
Entering edit mode
lhuang7 ▴ 50
@lhuang7-7824
Last seen 4.6 years ago
United States

Hi,

I try to create an annotation file using the function ParseMetaFromGtfFile() from SCAN.UPC package but get a warning message with no output file generated.

After searching the archive I found this 3-year old post related to the same issue (ParseMetaFromGtfFile is.na() error).

The following is the code snippet I used:

library(SCAN.UPC)

ParseMetaFromGtfFile(gtfFilePath = "gencode.v25.annotation.gtf", 
                     fastaFilePattern = "GRCh38.primary_assembly.genome.fa", 
                     outFilePath = "GRCh38_Annotation.txt",  
                     featureTypes = "protein_coding", 
                     attributeType = "gene_id")

# Saving GTF data to temporary files
# Done parsing 10000 lines from gencode.v25.annotation.gtf
# Done parsing 20000 lines from gencode.v25.annotation.gtf
# ...
# Done parsing 2570000 lines from gencode.v25.annotation.gtf
# Warning message:
# In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
sessionInfo()
# R version 3.4.1 (2017-06-30)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Sierra 10.12.6
# 
# Matrix products: default
# BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
# [8] methods   base     
# 
# other attached packages:
#  [1] SCAN.UPC_2.18.0     sva_3.25.4          BiocParallel_1.11.6
#  [4] genefilter_1.59.0   mgcv_1.8-19         nlme_3.1-131       
#  [7] foreach_1.4.3       affyio_1.47.0       affy_1.55.0        
# [10] GEOquery_2.43.0     oligo_1.41.1        Biostrings_2.45.3  
# [13] XVector_0.17.0      IRanges_2.11.12     S4Vectors_0.15.5   
# [16] oligoClasses_1.39.1 Biobase_2.37.2      BiocGenerics_0.23.0
# 
# loaded via a namespace (and not attached):
#  [1] SummarizedExperiment_1.7.5 splines_3.4.1             
#  [3] lattice_0.20-35            colorspace_1.3-3          
#  [5] yaml_2.1.14                blob_1.1.0                
#  [7] XML_3.98-1.9               survival_2.41-3           
#  [9] rlang_0.1.2                DBI_0.7                   
# [11] bit64_0.9-7                matrixStats_0.52.2        
# [13] GenomeInfoDbData_0.99.1    stringr_1.2.0             
# [15] zlibbioc_1.23.0            codetools_0.2-15          
# [17] memoise_1.1.0              ff_2.2-13                 
# [19] GenomeInfoDb_1.13.4        BiocInstaller_1.26.1      
# [21] AnnotationDbi_1.39.2       preprocessCore_1.39.0     
# [23] Rcpp_0.12.12               xtable_1.8-2              
# [25] limma_3.33.7               DelayedArray_0.3.19       
# [27] annotate_1.55.0            affxparser_1.49.0         
# [29] bit_1.1-12                 digest_0.6.12             
# [31] stringi_1.1.5              GenomicRanges_1.29.12     
# [33] grid_3.4.1                 tools_3.4.1               
# [35] bitops_1.0-6               magrittr_1.5              
# [37] RCurl_1.95-4.8             RSQLite_2.0               
# [39] tibble_1.3.4               MASS_7.3-47               
# [41] autoinst_0.0.0.9000        Matrix_1.2-11             
# [43] lubridate_1.6.0            httr_1.3.1                
# [45] iterators_1.0.8            R6_2.2.2                  
# [47] compiler_3.4.1

Did I do anything wrong? Can anyone kindly guide me to fix this problem?

Thanks,

Lei

annotation SCAN.UPC • 1.4k views
ADD COMMENT
0
Entering edit mode

I'll look into this and get back to you.

ADD REPLY
0
Entering edit mode

Thanks Stephen!

ADD REPLY
1
Entering edit mode
@stephen-piccolo-6761
Last seen 4.2 years ago
United States

Thanks for letting me know about this. Some of the information that is often stored in the second column was stored in a different location within the file. I believe I have fixed the problem. I'll post it as soon as I can to the devel server. But for now, send me an email, and I'll send you the fix.

As an aside, I implemented this parser before other GTF parsers were ubiquitous. For more advanced GTF parsing, it would be best to use one of those (e.g., https://bioconductor.org/packages/devel/bioc/manuals/GenomicFeatures/man/GenomicFeatures.pdf)

ADD COMMENT
0
Entering edit mode

Thanks for the quick fix!

ADD REPLY

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6