I am attempting proteome analysis in pRoloc, and am following the workflow of a recently published hyperLOPIT paper (https://www.nature.com/articles/nprot.2017.026/tables/3). I am currently on step 87, where I am trying to use supervised machine learning to predict the localisation of unlabelled proteins. I am attempting to set my parameters (as also seen in section 5.2.1 here: https://bioconductor.org/packages/3.7/bioc/vignettes/pRoloc/inst/doc/pRoloc-tutorial.html#52_supervised_ml), and keep getting an error message saying:
(data, reference, dnn = dnn, ...) : all arguments must have the same length
I assume this must be in relation to my list of markers, but am unsure why this is occurring. I have been successful in producing the Profile plots and PCA plots (in section 3.1 and 3.3 here: https://bioconductor.org/packages/3.7/bioc/vignettes/pRoloc/inst/doc/pRoloc-tutorial.html#52_supervised_ml), so I am pretty sure my MSnSet is working correctly.
I have attached my code below.
"msnmarkers" refers to my MSnSet, and within this "markers" is the marker list.
w <- table(fData(msnsetmarkers)[, "markers"]) w <- 1/w[names(w) != "unknown"] # the part I get an error message for: params <- svmOptimisation(msnsetmarkers, times = 100, xval = 5, class.weights = w) Below is the traceback:
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length 5. stop("all arguments must have the same length") 4. table(data, reference, dnn = dnn, ...) 3. confusionMatrix.default(ans, .test2$markers) 2. confusionMatrix(ans, .test2$markers) 1. svmOptimization(msnsetmarkers, times = 100, xval = 5, class.weights = w)
Here is the sessionInfo output:
sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.28.0 tidyr_0.8.0 dplyr_0.7.4 pRoloc_1.18.0 [5] MLInterfaces_1.58.0 cluster_2.0.6 annotate_1.56.1 XML_3.98-1.10 [9] AnnotationDbi_1.40.0 IRanges_2.12.0 S4Vectors_0.16.0 MSnbase_2.4.2 [13] ProtGenerics_1.10.0 BiocParallel_1.12.0 mzR_2.12.0 Rcpp_0.12.15 [17] Biobase_2.38.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] plyr_1.8.4 igraph_1.1.2 lazyeval_0.2.1 splines_3.4.3 [5] ggvis_0.4.3 crosstalk_1.0.0 ggplot2_2.2.1 digest_0.6.15 [9] foreach_1.4.4 htmltools_0.3.6 viridis_0.5.0 gdata_2.18.0 [13] magrittr_1.5 memoise_1.1.0 doParallel_1.0.11 sfsmisc_1.1-2 [17] limma_3.34.9 recipes_0.1.2 gower_0.1.2 rda_1.0.2-2 [21] dimRed_0.1.0 lpSolve_5.6.13 prettyunits_1.0.2 colorspace_1.3-2 [25] blob_1.1.0 RCurl_1.95-4.10 hexbin_1.27.2 genefilter_1.60.0 [29] bindr_0.1.1 impute_1.52.0 survival_2.41-3 iterators_1.0.9 [33] glue_1.2.0 DRR_0.0.3 gtable_0.2.0 ipred_0.9-6 [37] zlibbioc_1.24.0 kernlab_0.9-25 ddalpha_1.3.1.1 prabclus_2.2-6 [41] DEoptimR_1.0-8 scales_0.5.0 vsn_3.46.0 mvtnorm_1.0-7 [45] DBI_0.8 viridisLite_0.3.0 xtable_1.8-2 progress_1.1.2 [49] foreign_0.8-69 bit_1.1-12 proxy_0.4-21 mclust_5.4 [53] preprocessCore_1.40.0 lava_1.6 prodlim_1.6.1 sampling_2.8 [57] htmlwidgets_1.0 httr_1.3.1 threejs_0.3.1 FNN_1.1 [61] RColorBrewer_1.1-2 fpc_2.1-11 modeltools_0.2-21 pkgconfig_2.0.1 [65] flexmix_2.3-14 nnet_7.3-12 caret_6.0-78 tidyselect_0.2.4 [69] rlang_0.2.0 reshape2_1.4.3 munsell_0.4.3 mlbench_2.1-1 [73] tools_3.4.3 RSQLite_2.0 pls_2.6-0 broom_0.4.3 [77] stringr_1.3.0 mzID_1.16.0 ModelMetrics_1.1.0 knitr_1.20 [81] bit64_0.9-7 robustbase_0.92-8 randomForest_4.6-12 purrr_0.2.4 [85] dendextend_1.7.0 bindrcpp_0.2 nlme_3.1-131.1 whisker_0.3-2 [89] mime_0.5 RcppRoll_0.2.2 biomaRt_2.34.2 compiler_3.4.3 [93] e1071_1.6-8 affyio_1.48.0 tibble_1.4.2 stringi_1.1.7 [97] lattice_0.20-35 trimcluster_0.1-2 Matrix_1.2-12 psych_1.7.8 [101] gbm_2.1.3 pillar_1.2.1 MALDIquant_1.17 bitops_1.0-6 [105] httpuv_1.3.6.2 R6_2.2.2 pcaMethods_1.70.0 affy_1.56.0 [109] hwriter_1.3.2 gridExtra_2.3 codetools_0.2-15 MASS_7.3-49 [113] gtools_3.5.0 assertthat_0.2.0 CVST_0.2-1 withr_2.1.1 [117] mnormt_1.5-5 diptest_0.75-7 grid_3.4.3 rpart_4.1-13 [121] timeDate_3043.102 class_7.3-14 Rtsne_0.13 shiny_1.0.5 [125] lubridate_1.7.3 base64enc_0.1-3
Thanks very much!!!
Here are my markers. There are less than 13+ per marker - will this make a big impact?
organelleMarkers
CYTOSOL ER GOLGI LYSOSOME
25 48 3 9
MITOCHONDRIA NUCLEUS NUCLEUS-CHROMATIN PEROXISOME
34 11 2 2
PM PROTEASOME RIBOSOME 40S RIBOSOME 60S
12 24 25 41
unknown
1141
Thank you. My first suggestion would be to increase the number of markers, especially for the Golgi, chromatin and peroxisome. As I said, ideally, try to get 13+ for each class.
I can't say it this is the reason for the error you see (although I suspect it is), but even if it's not, currently, you won't be able to (1) get reliable model hyper-parameters without enough markers to train your model (that's what the
svmOptimisation
function helps with) and (2) it will be unlikely that more proteins will be assigned to these classes (and if they do, the assignments won't be very reliable).To help you with identifying markers, you may want to have a look at those we propose (based on previous studies); here's what we have at the moment:
See the documentation of the
addMarkers
function for help on how to add them.I have tried this and am getting the same error
Code for new markers and adding them:
Error:
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length
Markers now look like this:
The markers still look a bit week (Golgi apparatus has 2 proteins, Peroxison has 4).
The error comes from somewhere within the code, where
table
expects two vectors of the same length, and somehow they don't with your data:I could look into it if you send me your data.