Hi,
I'm having trouble at the first step of reading in data and generating a IBSpectra object. I am using the readIBSpectra() function. From the documentation the id.file parameter can be mzIdentml or .csv. I have both formats which I exported from a mascot search. When I try both mzid or csv, I receive the following error messages:
ib <- readIBSpectra("iTRAQ8plexSpectra",id.file=list.files(pattern=".mzid"), peaklist.file=list.files(pattern=".mgf"))
reading id file F004684_merged.mzid [type: mzid] ...Error in t.default(do.call(cbind, xpathApply(doc, paste0(root,"/x:AnalysisProtocolCollection/x:SpectrumIdentificationProtocol/x:ModificationParams/x:SearchModification"), : argument is not a matrix
ib <- readIBSpectra("iTRAQ8plexSpectra",id.file=list.files(pattern=".csv"), peaklist.file=list.files(pattern=".mgf"))
reading id file Mascot_search_results.csv [type: ibspectra] ... done
Error in [.data.frame
(id.data, , .SPECTRUM.COLS["SPECTRUM"]) :
undefined columns selected
I would like some assistance in understanding what the error means? Are there formatting issues? If so, what should the format be?
I have mgf files from proteowizard (for each individual sample) as well as one .mgf (consolidated) file from mascot distiller, that I assume contains information for all samples rolled into one .mgf file. Which is the best to use?
I appreciate you help, thanks!
sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.98-1.1 isobar_1.10.0 plyr_1.8.1 Biobase_2.24.0 RColorBrewer_1.1-2 [6] DESeq2_1.4.5 RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2 [11] IRanges_1.22.10 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] annotate_1.42.1 AnnotationDbi_1.26.1 DBI_0.3.1 distr_2.5.3 genefilter_1.46.1 [6] geneplotter_1.42.0 grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1 RSQLite_1.0.0 [11] sfsmisc_1.0-26 splines_3.1.0 startupmsg_0.9 stats4_3.1.0 survival_2.37-7 [16] SweaveListingUtils_0.6.2 tools_3.1.0 xtable_1.7-4 XVector_0.4.0
Hi Thomas,
Thanks for your suggestions! So I was able to successfully parse my mzIdentML file using the mzID package. I was able to extract out the following fields:
> names(flatResults)
[1] "spectrumid" "acquisitionnum" "calculatedmasstocharge"
[4] "chargestate" "experimentalmasstocharge" "rank"
[7] "passthreshold" "mascot:expectation value" "mascot:score"
[10] "peptide shared in multiple proteins" "peptide unique to one protein" "pepseq"
[13] "modified" "modification" "start"
[16] "end" "pre" "post"
[19] "isdecoy" "accession" "description"
[22] "databaseFile"
I'm waiting to hear back from the author, but perhaps isobar is looking for fields not found in the file. Some of the parameters mentioned in the error message do not seem to be present (Analysis protocol, search modification).
Anyway, I just wanted to follow up with your suggestion. Any other ideas are welcome.
Thanks again!