Question

Intensity discrepancy between raw data and XCMS output

0

Entering edit mode

cpataki • 0

@cpataki-17190

Last seen 6.6 years ago

I am using xcms for metabolomic work, and I'm coming across a bug that I can not seem to track down. I've scoured the online forums and am trying a grid search of various parameters, but I'd love to get your input on this problem.

My Goal:

To identify features in an experiment with various samples that are independently injected into the mass spec and correct for retention time.

The commands I am using (in the order that I am using them):

Reading in the data: readMSData

Identify chromatographic peaks: findChromPeaks() with a CentWaveParam object

Retention time correction: adjustRtime() with a ObiWarpParam object

Grouping peaks: groupChromPeaks() with a PeakDensityParam object

Output: featureValues() to list all features with each column representing the intensity of the features in one sample.

My Problem:

All the features identified in the one sample have the very similar intensities (+/- 10%) in the featureValues() output. said differently: Each feature has different intensities in all samples, but all features in the same sample differ by approximately 10%.

This trend is not consistent with the raw traces of the data. I see much more variability between peak intensities in each sample.

I've been reading the xcms documentation to track down what could be changing/controlling peak intensities, but I can not seem to find any parameter in the commands I'm using that could affect peak intensities. I'm very much working in the dark here. Any insight on your part would be HIGHLY appreciated.

xcms xcmsset intensity • 1.8k views

ADD COMMENT • link 6.6 years ago cpataki • 0

0

Entering edit mode

Without having a good understanding of your experimental setup and without looking into the actual data or seeing all commands with all parameters that you used it is pretty hard to tell what happens or where the problem is. Some suggestions:

1) do never use the default parameters, especially not for the peak detection part. This very much depends on your LCMS setup (i.e. how broad the peaks are etc.). Have eventually a look at https://jotsetung.github.io/metabolomics2018/xcms-preprocessing.html for some details how to determine data set specific settings.

2) Check again the parameters that you used, especially make sure that you used featureValues with the parameter value = "into". By default the function returns just the index of the corresponding peak in the peak matrix but not the integrated peak intensity (which you get with "into").

3) Check how many peaks you have and how many features. i.e. do table(chromPeaks(your_data)[, "sample"]) to get an overview how many peaks were detected per sample and compare that number to nrow(featureValues(your_data)). Eventually only few peaks were grouped into a feature. Alternatively it could be that multiple peaks in each sample were added to the feature.

4) It is always good to have a look at how known compounds or internal standards look like, i.e. use the chromatogram function to extract the ion chromatogram for some and plot them, eventually highlighting the identified chromatographic peaks in that region using the highlightChromPeaks function (check the corresponding help page for details). This can help you understand what might have happened and you get a better feeling for your data.

5) If you think this is indeed a bug in xcms you could also make an issue at https://github.com/sneumann/xcms

For future posts it would be very helpful to post also the output of the sessionInfo() and to show the actual commands that you used (with all parameters).

ADD REPLY • link 6.6 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Thank you so much, Johannes. This was very helpful. I do think I might be implementing my code sub-optimally, which is producing these weird results. Looking at the raw data, I don't think xcms is changing the intensity of the features, I think I'm not identifying the peaks with the right parameters or I'm not grouping the peaks properly, which is why I'm identifying VERY similar features through out the entirety of the experiment (but concentrated within a range of retention times).

I'd love to give you a little more detail about my code and maybe you can tell me if I'm 1) not grouping peaks enough times (I see that some people group peaks 2-3 times in their code), or 2) I'm using suboptimal parameters for identifying peaks. Please note that I am using a mixture of parameters suggested by xcms-online for my experimental set-up and parameters determined by using "IPO: a tool for automated optimization of XCMS parameters" (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0562-8). I'm limited by the character number, so I will add another comment

ADD REPLY • link 6.6 years ago cpataki • 0

0

Entering edit mode

code (note that I am running this through the command line and not through RStudio):

library(xcms)

fols <- commandArgs(trailingOnly=TRUE)

setwd(fols[1])

#defines the .mzxml files to be processed and which samples are technical replicates

filegroups = read.csv("filegroups.csv", row.names = 1, stringsAsFactors = F)

mzxml_pos <- filegroups$File

fileaccess <- readMSData(mzxml_pos, pdata = NULL, verbose = isMSnbaseVerbose(), msLevel. = 1, centroided. = T, smoothed. = NA,

mode = "onDisk")

cparam <- CentWaveParam(ppm = 10, peakwidth = c(5,30), snthresh = 3., mzCenterFun = "wMean", firstBaselineCheck = FALSE, prefilter = c(3,100), mzdiff = -0.01, noise = 0, fitgauss = FALSE, verboseColumns = FALSE, integrate=1)

bparam <- SnowParam(workers = 16)

xset <- findChromPeaks(fileaccess, cparam, BPPARAM = bparam, return.type = "XCMSnExp")

fgroups <- as.integer(as.factor(filegroups$Group))

gparam <- PeakDensityParam(sampleGroups = fgroups, minFraction = .5, bw = 5, binSize = 0.025, maxFeatures = 100, minSamples = 1)

rparam <- ObiwarpParam(binSize = 1, #suggested by xcms online

response = 1, # not specified in xcms online

distFun = "cor_opt", # not specified in xcms online -- using default

gapInit = 0.3, # not specified in xcms online - default

gapExtend = 2.4, # not specified in xcms online

factorDiag = 2, # not specified in xcms online - penalty/weight for diagonal moves

factorGap = 1, # not specified in xcms online - penalty/weigh for gap moves in alignment

localAlignment = F, # not specified in xcms online - Default = False

initPenalty = 0) # not specified in xcms online - penalty for initiating alignment

xset <- adjustRtime(xset, param = rparam)

xset <- groupChromPeaks(xset, gparam)

xset2 <- featureValues(xset, method = "maxint", value="index", intensity = "into", filled = TRUE)

xset3 <- featureDefinitions(xset)

write.csv(xset2, file = "xcms_output_features_members.csv")

write.csv(xset3, file = "xcms_output_features.csv")

ADD REPLY • link 6.6 years ago cpataki • 0

0

Entering edit mode

Thanks for providing the commands you're using for the analysis. I can not tell if the parameters are correct for your data but using IPO is definitely a good starting point. However, as mentioned in my previous reply, you should use featureValues with value = "into" and not value = "index". As it is also stated in the help of the featureValues function (which you can see by typing ?featureValues in the R terminal), value = "index" returns the index of the peaks in the peak matrix but not the integrated peak area that you want to extract. These can be returned with value = "into". I'll update the help page of the function to make this more explicit.

ADD REPLY • link 6.6 years ago Johannes Rainer ★ 2.1k