Hello,
I am working on developing an R script that utilizes the MSnbase R bioconductor package and am having a bit of trouble getting the MSnbase commands to operate as I wish inside of a custom R function. I have used this method in the past to retrieve intensity values from a raw MS data file with individual set mass to charge values but never been able to get it to quite work by pulling the inputted from a data frame with a list of mass to charge values then for it to return the summed intensities for the given input.
I will be using the MSnbase and plyr package in the following code, if these are not the right packages to approach this task, any recommendations would be appreciated.
I think it is best to first look at the code in the script to get an idea for what I am going for. Basically, I would like to input mass to charge values from a data frame and return a new column with the sum of the intensity’s values in a new column in the data frame from a MSnbase chromatogram function. As of right now I am working with a simple data set with 5 entries but in the future, I need to upscale this process to several thousand, therefore I cannot simple do this by hand. The problem I am having is I can not figure out how to do this with MSnbase with a data frame as the input for mass to charge values. I would like to get this to work where I can pull one mass to charge (MZHplus) value at a time from a data frame to search for the summed intensity for a given mass to charge value. I believe this to be a simple syntax error or something in my code that I am missing / using the wrong commands for the job I am attempting.
Here is the current code with comments that I am working with…
#Load libraries
library("MSnbase", lib.loc="~/R/win-library/3.6")
library(plyr)
#File to load - F1
#MSData.mzML
#Read raw MS mzML data file
#Note, according to forums the following error can be ignored...
# "Error in x$.self$finalize() : attempt to apply non-function"
msd <- readMSData("MSData.mzML", verbose = FALSE)
#Load the .csv file with peptide mz values for use in the following function
#Change input to desired list of mz to be used for ion search
peptideTable <- read.csv("test-data.csv")
#Creates the peptide_intensity_sum function
peptide_intensity_sum <- function(mz){
#set up the rentionsion time range
rtr <- c(1, 60000)
#set up the mass to charge range
minmz <- (mz – 0.015)
maxmz <- (mz + 0.015)
mzr <- c(minmz, maxmz)
#Chromatogram query to get all intensities values from mass spec data
chrs <- chromatogram(msd, rt = rtr, mz = mzr, aggregationFun = "sum", msLevel = 2)
#Store intensities values in int var
int <- intensity(chrs[1, 1])
#Compute sum of intensities values, remove NA values
summedInts <- sum(int, na.rm = TRUE)
#Return intSum value
return(summedInts)
#Clean up function environment
rm(c(minmz, maxmz, mzr, chrs, int, summedInts))
}
#Run the function above on the MZHplus value and place summed intensities into new column
#the following works but outputs calc value on for first entry only?
intSum <- mutate(peptideTable, peptideIntSum = peptide_intensity_sum(MZHplus))
write.csv(intSum, "intensities.csv")
Here is the Input peptide data...
ID Sequence Master.Protein.Accessions MZHplus
1 QNAQCLHGDIAQSQR Q99MJ9 1870.937
2 VGNLGLATSFFNER Q62095 1669.909
3 QLCDNAGFDATNILNK P80313 2084.105
4 IIDGGSGYLCEMEPVAHFGLGR Q8R555 2523.255
5 LSECLQEVYEPEWPGRDEANK O08539 2840.402
Here is the output…
ID Sequence Master.Protein.Accessions MZHplus peptideintSum
1 QNAQCLHGDIAQSQR Q99MJ9 1870.937 546252843
2 VGNLGLATSFFNER Q62095 1669.909 546252843
3 QLCDNAGFDATNILNK P80313 2084.105 546252843
4 IIDGGSGYLCEMEPVAHFGLGR Q8R555 2523.255 546252843
5 LSECLQEVYEPEWPGRDEANK O08539 2840.402 546252843
As can be seen in the outputted data frame above, the same entry “546252843” is placed in each row of the peptideIntSum column instead of different summed intensity values for each mass to charge value (MZHplus). I think this is a syntax error or something. I would just like the chromatogram intensity function to run one row at a time and return an input into the peptideIntSum column. Or maybe the MSnbase package cannot do this. Any help would be appreciated.
Thank you
Let me know if more information is needed. :)
Edit: added input and output, fixed file name.
There's no input/output data to look at.
ah ok ill fix that. I tried adding images but ill do it as text. Thanks. Edit: it was a little tricky to get the input/output data frames entered as text. still learning the formatting for this forum. thanks :)