Question

metaMS for analysis of GC-MS data

0

Entering edit mode

lucie.conchou • 0

@lucieconchou-10787

Last seen 8.6 years ago

Hello,

I am currently trying out metaMS for the analysis of GC-MS data from plant volatile samples, using a dataset I have already analysed manually. My chromatograms are usually very complex, with mostly medium to low intensity peaks (many are close to detection limit and dont stand out of the background on the total ion chromatogram).

I read the tutorial dedicated to the runGC workflow (this one), and I used the example set of parameters, as demonstrated in the workflow tutorial.

I changed only the parameters below, to ensure that all detected pseudospectra are displayed in the result table, even if found in only one sample:

data(FEMsettings)
myparam=TSQXLS.GC
metaSetting(myparam,"betweenSamples")$min.class.fraction=0.0001
metaSetting(myparam,"betweenSamples")$min.class.size=1

Here is the output:

cdfdir="C:/R folder/metaMS/lilian_data"
cdffiles <- list.files(cdfdir, full.names = FALSE, ignore.case = TRUE)
result <- runGC(files = cdffiles, settings = myparam, DB = NULL,nSlaves = 2)


> result$PeakTable[,1:4]
         Name   Class  rt.sd     rt LC_18april2016_a LC_18april2016_blank
1   Unknown 1 Unknown 0.0023  4.808           173184               165661
2   Unknown 2 Unknown 0.0045  4.375           141888               136037
3   Unknown 3 Unknown 0.0015 22.912           113896                    0
[...]
13 Unknown 13 Unknown 0.0019 13.849            11562                    0
14 Unknown 14 Unknown 0.0023 37.338             5850                    0
15 Unknown 15 Unknown 0.0023  9.094             4081                    0
[...]
19 Unknown 19 Unknown 0.0032 12.820
20 Unknown 20 Unknown 0.0019 37.337                0                 5846
21 Unknown 21 Unknown 0.0042 18.210                0                 1516
22 Unknown 22 Unknown 0.0018  9.082                0                    0
23 Unknown 23 Unknown 0.0018 10.933                0                    0
[...]
31 Unknown 31 Unknown 0.0028 16.402                0                    0
32 Unknown 32 Unknown 0.0024 10.919                0                    0
[...]
50 Unknown 50 Unknown 0.0023 12.818                0                    0
51 Unknown 51 Unknown 0.0024 13.770                0                    0
52 Unknown 52 Unknown 0.0019 12.819                0                    0
53 Unknown 53 Unknown 0.0028 12.039                0                    0
54 Unknown 54 Unknown 0.0070 13.301                0                    0

I see two main issues from this table.

1) there are cases where two peaks that the algorhythm considered as different are obviously the same molecule (same retention time + similar spectrum). For example peak #15 and #22 (below). My hypothesis is that variations in peak intensity and in background intensity result in statistically different pseudospectra. I suppose that making the peak matching conditions less stringent would help but I couldn't find out which parameters do that.

2) way more problematic, peaks that I know are there and relatively intense are not even detected. For example, peak #19 corresponds to limonene, which is present in every single sample. I have no trouble detecting it when looking at any of my samples, but it is reported in the result table only from half of them (even after correcting for problem 1)! This worries me a lot since there are many compounds in those samples that are much smaller and more difficult to detect than limonene. But again, I couldnt find out which parameters could possibly improve peak detection.

So here come my questions:

-Can anyone recommend a user friendly document that clearly explains what the parameters in xcms and metaMS are and what they do?

-(and/or) Can anyone recommend a set of parameters suitable for me ? (I work on a GC-MS quadrupole, scanning m/z 29–400, 3.8 scans.s–1, usual peak width 0.1 min, retention time variation <0.1 min, please let me know if important stuff are missing)

-any idea how powerful/accurate metaMS is on this kind of data, especially for the detection of small peaks? (GC-MS, volatile compound profiles)

-any suggestion to solve the issues pointed out above?

Thank you in advance for you advice and suggestions

Lucie

metaMS xcms GC-MS metabolomics volatile organic compounds • 1.8k views

ADD COMMENT • link updated 8.9 years ago by ron.wehrens ▴ 20 • written 8.9 years ago by lucie.conchou • 0

score 0 · Answer 1 · 2016-05-31

Hi Lucie,

thanks for your feedback on metaMS!

Just to clarify matters, what metaMS does is basically to use XCMS for peak picking, and CAMERA for the definition of pseudospectra. It allows sets of machine-specific parameters to be defined and stored as objects which can then be used in the package. Then there are several tools to set up a database of pure standards, measured in house, and annotation of new data using these standards, but this part is not being used by you.

So to come to the point (your questions):

- parameters in xcms are most completely explained in the xcmsPreprocess vignette that comes with the package. There are also quite active discussion groups about setting parameters for specific machines, maybe you can find some hints there. If important peaks are missed you can visualize the EICs to see what is going on. You could even consider using another peakpicking method (default in metaMS is matchedFilter, but xcms has others). I think in general optimizing these and other parameters is the most painful part of data analysis - there are also attempts to automatically identify optimal settings for xcms, have a google.

- parameters in CAMERA are likewise explained in the vignette - in metaMS the default is to use a very simple rt-based clustering, so that parameter should be quite easy to set.

Hope these pointers are useful...

Cheers,

Ron