Spectra and PSMatch- trouble in reducePSMs, joinSpectraData, or countIdentifications?
Entering edit mode
Last seen 13 hours ago
United States


I used MS-GF+ on the mzML file, using the .faa file as reference, and produce the .mzid file. All files are in the following link: https://figshare.com/s/b65fc594da19f0f9347f . The raw data was produced by Bruker Impact II (Q-TOF MS/MS), and was transformed into mzML file by ProteoWizard msconvert.

However, when I tried to use the function countIdentifications, an error popped up. I am not sure whether there was something wrong about the reducePSMs, joinSpectraData, or countIdentifications.

My codes were as below. The MS-GF+ was run in Mac terminal; other codes were in R Studio.

java -version 
java version "1.8.0_421"

java -Xmx3500M -jar MSGFPlus.jar -s "HH090441864_2024-09-09_2648.mzML" -d "protein.faa" -tda 0 -inst 2 -e 1 -maxMissedCleavages 2 -o HH090441864_2.mzid


sp_ident <- joinSpectraData(sp, id_f_r,
                      by.x = "spectrumId",
                      by.y = "spectrumID")

sp_ident_count <- countIdentifications(sp_ident)

#Error: BiocParallel errors
 # 1 remote errors, element index: 1
 # 0 unevaluated and other errors
 # first remote error:
#Error in as.vector(x, mode): coercing an AtomicList object to an atomic vector is supported only for
  #objects with top-level elements of length <= 1

Thank you in advance for any thoughts!

ProteomicsWorkflow PSMatch Spectra RforProteomics • 50 views
Entering edit mode
Last seen 2 hours ago

Tthis happens because the sequence spectraVariable is expected to be an atomic vector or a List with elements of length 1, but it's not - some of your MS2 scans have > 1 sequence:

> lengths(sp_ident[["sequence"]])
  [1] 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 2 1 0 1 1 0 1 0
 [38] 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1
 [75] 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 2 1 1 0 0 0 0 1 0 0
> sp_ident[["sequence"]][30]
CharacterList of length 1

You should filter out your PSMs first, and only keep the best one for each MS2 scan.

This can be traced back to the identification data:

> which(lengths(id_f_r$sequence) > 1)
scan=1054  scan=109  scan=127 scan=1284  scan=133 scan=1377 scan=1459 scan=1464 
       22        26        72        79        93       106       135       138 
scan=1611 scan=1629 scan=1833 scan=2296 scan=2301 scan=2369 scan=2433 scan=2543 
      190       196       272       419       422       448       474       510 
scan=2547 scan=2586   scan=30 scan=3089 scan=3157 scan=3201  scan=322 scan=3269 
      512       517       650       680       705       720       728       750 
scan=3333 scan=3718 scan=3758 scan=3864 scan=4104 scan=4108 scan=4111 scan=4126 
      775       914       925       956      1035      1037      1038      1044 
scan=4168 scan=4289 scan=4292 scan=4709 scan=4742 scan=4802 scan=4822 scan=4876 
     1053      1090      1091      1230      1241      1267      1272      1297 
scan=4923 scan=4961 scan=5122 scan=5524 scan=5692  scan=571 scan=5713 scan=5718 
     1318      1328      1392      1538      1579      1585      1586      1588 
scan=5948 scan=6086 scan=6177 scan=6206 scan=6234 scan=6414 scan=6422 scan=6424 
     1671      1735      1757      1772      1790      1852      1856      1858 
scan=6429 scan=6463 scan=6618 scan=6668 scan=6788 scan=6943 scan=6954 scan=7136 
     1861      1872      1924      1942      1984      2060      2065      2137 
scan=7152 scan=7171 scan=7282 scan=7369 scan=7650 scan=7660 scan=7896 scan=7904 
     2145      2155      2194      2215      2314      2316      2375      2376 
scan=7934  scan=804 scan=8308   scan=91 
     2380      2412      2498      2568

I suggest you check why this happens.


Login before adding your answer.

Traffic: 393 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6