Hello, I have been trying to run customProDB to build a customized protein database for my datasets, but whenever i run it through the easyrun function it keeps giving me the same error:
Calculate RPKMs and Output proteins pass the cutoff into FASTA file ... Error in keepSeqlevels(anno, seqlevels(galn), pruning.mode = "coarse") : invalid seqlevels: chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr2, chr3, chr4, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584292_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5, chr5_GL456354_random, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_JH584299_random, chr6, chr7, chr7_GL456219_random, chr8, chr9, chrM, chrUn_GL456239, chrUn_GL456359, chrUn_GL456360, chrUn_GL456366, chrUn_GL456367, chrUn_GL456368, chrUn_GL456370, chrUn_GL456372, chrUn_GL456378, chrUn_GL456379, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456387, chrUn_GL456389, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456396, chrUn_JH584304, chrX, chrX_GL456233_random, chrY, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584
Keeping in mind that this error appears with different human and mice datasets, my workflow goes as follows:
1) Prepare the annotation files:
- Download coding sequence FASTA files (Genome and Protein from UCSC according to manual instructions)
- Run through the terminal with the following code:
library(customProDB)
pepfasta <- system.file("extdata/mm10", "Mouse__refGene_(protein)].fasta", package="customProDB")
CDSfasta <- system.file("extdata/mm10", "UCSC_Main_on_Mouse__refGene_(genome).fasta", package="customProDB")
annotation_path <- tempdir()
PrepareAnnotationRefseq(genome='mm10', CDSfasta, pepfasta, annotation_path, dbsnp=NULL, transcript_ids=NULL, splice_matrix=TRUE, ClinVar=FALSE)
2) After my R annotation files are generated i run the easyrun function as mentioned in the manual:
library(customProDB)
bamFile <- system.file("extdata/mm10", "mm10_aligned_sorted.bam", package="customProDB")
vcffile <- system.file("extdata/mm10", "freebayes.vcf", package="customProDB")
annotation_path <- system.file("extdata/mm10", package="customProDB")
outfile_path <- tempdir()
outfile_name <- 'test_mm10'
easyRun(bamFile, RPKM=NULL, vcffile, annotation_path, outfile_path, outfile_name, rpkm_cutoff=1, INDEL=TRUE, lablersid=FALSE, COSMIC=FALSE, nov_junction=FALSE)
It gives me the error, i really want to know what i am doing wrong or how to fix it so if anyone could please help i would be very grateful, thank you so much in advance.
R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages: [1] customProDB_1.40.0 biomaRt_2.56.0 AnnotationDbi_1.62.1 [4] Biobase_2.60.0 IRanges_2.34.0 S4Vectors_0.38.1
[7] BiocGenerics_0.46.0
loaded via a namespace (and not attached): [1] KEGGREST_1.40.0 SummarizedExperiment_1.30.1 [3] AhoCorasickTrie_0.1.2 rjson_0.2.21
[5] lattice_0.21-8 vctrs_0.6.2
[7] tools_4.3.0 bitops_1.0-7
[9] generics_0.1.3 curl_5.0.0
[11] parallel_4.3.0 tibble_3.2.1
[13] fansi_1.0.4 RSQLite_2.3.1
[15] blob_1.2.4 pkgconfig_2.0.3
[17] Matrix_1.5-1 BSgenome_1.68.0
[19] dbplyr_2.3.2 lifecycle_1.0.3
[21] GenomeInfoDbData_1.2.10 compiler_4.3.0
[23] stringr_1.5.0 Rsamtools_2.16.0
[25] Biostrings_2.68.1 progress_1.2.2
[27] codetools_0.2-19 GenomeInfoDb_1.36.0
[29] yaml_2.3.7 RCurl_1.98-1.12
[31] pillar_1.9.0 crayon_1.5.2
[33] BiocParallel_1.34.1 DelayedArray_0.26.2
[35] cachem_1.0.8 tidyselect_1.2.0
[37] digest_0.6.31 stringi_1.7.12
[39] VariantAnnotation_1.46.0 restfulr_0.0.15
[41] dplyr_1.1.2 fastmap_1.1.1
[43] grid_4.3.0 cli_3.6.1
[45] magrittr_2.0.3 GenomicFeatures_1.52.0
[47] S4Arrays_1.0.4 XML_3.99-0.14
[49] utf8_1.2.3 prettyunits_1.1.1
[51] filelock_1.0.2 rappdirs_0.3.3
[53] bit64_4.0.5 XVector_0.40.0
[55] httr_1.4.6 matrixStats_0.63.0
[57] bit_4.0.5 png_0.1-8
[59] hms_1.1.3 memoise_2.0.1
[61] BiocIO_1.10.0 GenomicRanges_1.52.0
[63] BiocFileCache_2.8.0 rtracklayer_1.60.0
[65] rlang_1.1.1 Rcpp_1.0.10
[67] glue_1.6.2 DBI_1.1.3
[69] xml2_1.3.4 plyr_1.8.8
[71] R6_2.5.1 MatrixGenerics_1.12.0
[73] GenomicAlignments_1.36.0 zlibbioc_1.46.0