I am using xcms for metabolomic work, and I'm coming across a bug that I can not seem to track down. I've scoured the online forums and am trying a grid search of various parameters, but I'd love to get your input on this problem.
My Goal:
To identify features in an experiment with various samples that are independently injected into the mass spec and correct for retention time.
The commands I am using (in the order that I am using them):
Reading in the data: readMSData
Identify chromatographic peaks: findChromPeaks() with a CentWaveParam object
Retention time correction: adjustRtime() with a ObiWarpParam object
Grouping peaks: groupChromPeaks() with a PeakDensityParam object
Output: featureValues() to list all features with each column representing the intensity of the features in one sample.
My Problem:
All the features identified in the one sample have the very similar intensities (+/- 10%) in the featureValues() output. said differently: Each feature has different intensities in all samples, but all features in the same sample differ by approximately 10%.
This trend is not consistent with the raw traces of the data. I see much more variability between peak intensities in each sample.
I've been reading the xcms documentation to track down what could be changing/controlling peak intensities, but I can not seem to find any parameter in the commands I'm using that could affect peak intensities. I'm very much working in the dark here. Any insight on your part would be HIGHLY appreciated.
Without having a good understanding of your experimental setup and without looking into the actual data or seeing all commands with all parameters that you used it is pretty hard to tell what happens or where the problem is. Some suggestions:
1) do never use the default parameters, especially not for the peak detection part. This very much depends on your LCMS setup (i.e. how broad the peaks are etc.). Have eventually a look at https://jotsetung.github.io/metabolomics2018/xcms-preprocessing.html for some details how to determine data set specific settings.
2) Check again the parameters that you used, especially make sure that you used
featureValues
with the parametervalue = "into"
. By default the function returns just the index of the corresponding peak in the peak matrix but not the integrated peak intensity (which you get with"into"
).3) Check how many peaks you have and how many features. i.e. do
table(chromPeaks(your_data)[, "sample"])
to get an overview how many peaks were detected per sample and compare that number tonrow(featureValues(your_data))
. Eventually only few peaks were grouped into a feature. Alternatively it could be that multiple peaks in each sample were added to the feature.4) It is always good to have a look at how known compounds or internal standards look like, i.e. use the
chromatogram
function to extract the ion chromatogram for some andplot
them, eventually highlighting the identified chromatographic peaks in that region using thehighlightChromPeaks
function (check the corresponding help page for details). This can help you understand what might have happened and you get a better feeling for your data.5) If you think this is indeed a bug in xcms you could also make an issue at https://github.com/sneumann/xcms
For future posts it would be very helpful to post also the output of the
sessionInfo()
and to show the actual commands that you used (with all parameters).Thank you so much, Johannes. This was very helpful. I do think I might be implementing my code sub-optimally, which is producing these weird results. Looking at the raw data, I don't think xcms is changing the intensity of the features, I think I'm not identifying the peaks with the right parameters or I'm not grouping the peaks properly, which is why I'm identifying VERY similar features through out the entirety of the experiment (but concentrated within a range of retention times).
I'd love to give you a little more detail about my code and maybe you can tell me if I'm 1) not grouping peaks enough times (I see that some people group peaks 2-3 times in their code), or 2) I'm using suboptimal parameters for identifying peaks. Please note that I am using a mixture of parameters suggested by xcms-online for my experimental set-up and parameters determined by using "IPO: a tool for automated optimization of XCMS parameters" (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0562-8). I'm limited by the character number, so I will add another comment
code (note that I am running this through the command line and not through RStudio):
library(xcms)
fols <- commandArgs(trailingOnly=TRUE)
setwd(fols[1])
#defines the .mzxml files to be processed and which samples are technical replicates
filegroups = read.csv("filegroups.csv", row.names = 1, stringsAsFactors = F)
mzxml_pos <- filegroups$File
fileaccess <- readMSData(mzxml_pos, pdata = NULL, verbose = isMSnbaseVerbose(), msLevel. = 1, centroided. = T, smoothed. = NA,
mode = "onDisk")
cparam <- CentWaveParam(ppm = 10, peakwidth = c(5,30), snthresh = 3., mzCenterFun = "wMean", firstBaselineCheck = FALSE, prefilter = c(3,100), mzdiff = -0.01, noise = 0, fitgauss = FALSE, verboseColumns = FALSE, integrate=1)
bparam <- SnowParam(workers = 16)
xset <- findChromPeaks(fileaccess, cparam, BPPARAM = bparam, return.type = "XCMSnExp")
fgroups <- as.integer(as.factor(filegroups$Group))
gparam <- PeakDensityParam(sampleGroups = fgroups, minFraction = .5, bw = 5, binSize = 0.025, maxFeatures = 100, minSamples = 1)
rparam <- ObiwarpParam(binSize = 1, #suggested by xcms online
response = 1, # not specified in xcms online
distFun = "cor_opt", # not specified in xcms online -- using default
gapInit = 0.3, # not specified in xcms online - default
gapExtend = 2.4, # not specified in xcms online
factorDiag = 2, # not specified in xcms online - penalty/weight for diagonal moves
factorGap = 1, # not specified in xcms online - penalty/weigh for gap moves in alignment
localAlignment = F, # not specified in xcms online - Default = False
initPenalty = 0) # not specified in xcms online - penalty for initiating alignment
xset <- adjustRtime(xset, param = rparam)
xset <- groupChromPeaks(xset, gparam)
xset2 <- featureValues(xset, method = "maxint", value="index", intensity = "into", filled = TRUE)
xset3 <- featureDefinitions(xset)
write.csv(xset2, file = "xcms_output_features_members.csv")
write.csv(xset3, file = "xcms_output_features.csv")
Thanks for providing the commands you're using for the analysis. I can not tell if the parameters are correct for your data but using IPO is definitely a good starting point. However, as mentioned in my previous reply, you should use
featureValues
withvalue = "into"
and notvalue = "index"
. As it is also stated in the help of thefeatureValues
function (which you can see by typing?featureValues
in the R terminal),value = "index"
returns the index of the peaks in the peak matrix but not the integrated peak area that you want to extract. These can be returned withvalue = "into"
. I'll update the help page of the function to make this more explicit.