Hi,
I'm trying to match a vector
of peptide sequences against an AAStringSet
to get all perfect matches.
I thought the most straightforward way to do this is to create a PDict
object from the vector
of peptide sequences using:
PDict(peptide.seq.vec)
And then use one of the matchPDict
functions of the PDict
object vs. the AAStringSet
reference to get all perfect matches.
However, running:
PDict(peptide.seq.vec)
Already throws this error:
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), : key 73 (char 'I') not in lookup table
peptide.seq.vec[1]
is
"KNVSIGIVGKD"
Is it expecting a DNA sequence only? The documentation of PDict says it accepts a character vector, not necessarily a DNA string
Any idea?
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.42.1 XVector_0.14.0 matrixStats_0.51.0 topGO_2.26.0 SparseM_1.72 [6] graph_1.50.0 fastcluster_1.1.22 cluster_2.0.5 GO.db_3.4.0 org.Hs.eg.db_3.4.0 [11] AnnotationDbi_1.36.0 Biobase_2.34.0 gageData_2.12.0 gage_2.24.0 biomaRt_2.30.0 [16] rtracklayer_1.34.1 GenomicRanges_1.26.2 GenomeInfoDb_1.10.0 IRanges_2.8.1 S4Vectors_0.12.1 [21] BiocGenerics_0.20.0 doBy_4.5-15 yaml_2.1.14 doParallel_1.0.10 iterators_1.0.8 [26] foreach_1.4.3 snpEnrichment_1.7.0 fgsea_1.0.2 Rcpp_0.12.8 data.tree_0.6.2 [31] zoo_1.7-13 gplots_3.0.1 ggdendro_0.1-20 RColorBrewer_1.1-2 venneuler_1.1-0 [36] rJava_0.9-8 scales_0.4.1 reshape2_1.4.2 plotrix_3.6-3 outliers_0.14 [41] Hmisc_3.17-4 Formula_1.2-1 survival_2.40-1 lattice_0.20-34 data.table_1.9.6 [46] edgeR_3.16.1 limma_3.30.2 ggpmisc_0.2.12 dplyr_0.5.0 plyr_1.8.4 [51] magrittr_1.5 gridExtra_2.2.1 ggplot2_2.2.1 dendextend_1.3.0 ape_4.0 loaded via a namespace (and not attached): [1] colorspace_1.2-7 class_7.3-14 modeltools_0.2-21 mclust_5.2 [5] rstudioapi_0.6 flexmix_2.3-13 mvtnorm_1.0-5 codetools_0.2-15 [9] splines_3.3.2 snpStats_1.24.0 robustbase_0.92-6 jsonlite_1.1 [13] Rsamtools_1.26.1 kernlab_0.9-25 png_0.1-7 DiagrammeR_0.9.0 [17] httr_1.2.1 assertthat_0.1 Matrix_1.2-7.1 lazyeval_0.2.0 [21] acepack_1.4.1 visNetwork_1.0.3 htmltools_0.3.5 tools_3.3.2 [25] igraph_1.0.1 gtable_0.2.0 fastmatch_1.0-4 rgexf_0.15.3 [29] trimcluster_0.1-2 gdata_2.17.0 nlme_3.1-128 fpc_2.1-10 [33] stringr_1.1.0 gtools_3.5.0 XML_3.98-1.4 DEoptimR_1.0-6 [37] zlibbioc_1.20.0 MASS_7.3-45 SummarizedExperiment_1.2.3 rpart_4.1-10 [41] latticeExtra_0.6-28 stringi_1.1.2 RSQLite_1.0.0 Rook_1.1-1 [45] caTools_1.17.1 BiocParallel_1.8.1 chron_2.3-47 prabclus_2.2-6 [49] bitops_1.0-6 GenomicAlignments_1.8.4 htmlwidgets_0.8 R6_2.2.0 [53] DBI_0.5-1 whisker_0.3-2 foreign_0.8-67 KEGGREST_1.14.0 [57] RCurl_1.95-4.8 nnet_7.3-12 tibble_1.2 KernSmooth_2.23-15 [61] viridis_0.3.4 locfit_1.5-9.1 influenceR_0.1.0 digest_0.6.11 [65] diptest_0.75-7 brew_1.0-6 munsell_0.4.3
Also please check matching of AAStringSet vs. another AAStringSet for a similar question and an efficient solution for the exact matching case based on CRAN package AhoCorasickTrie.
H.