Hello,
I am currently trying out metaMS for the analysis of GC-MS data from plant volatile samples, using a dataset I have already analysed manually. My chromatograms are usually very complex, with mostly medium to low intensity peaks (many are close to detection limit and dont stand out of the background on the total ion chromatogram).
I read the tutorial dedicated to the runGC workflow (this one), and I used the example set of parameters, as demonstrated in the workflow tutorial.
I changed only the parameters below, to ensure that all detected pseudospectra are displayed in the result table, even if found in only one sample:
data(FEMsettings) myparam=TSQXLS.GC metaSetting(myparam,"betweenSamples")$min.class.fraction=0.0001 metaSetting(myparam,"betweenSamples")$min.class.size=1
Here is the output:
cdfdir="C:/R folder/metaMS/lilian_data" cdffiles <- list.files(cdfdir, full.names = FALSE, ignore.case = TRUE) result <- runGC(files = cdffiles, settings = myparam, DB = NULL,nSlaves = 2) > result$PeakTable[,1:4] Name Class rt.sd rt LC_18april2016_a LC_18april2016_blank 1 Unknown 1 Unknown 0.0023 4.808 173184 165661 2 Unknown 2 Unknown 0.0045 4.375 141888 136037 3 Unknown 3 Unknown 0.0015 22.912 113896 0 [...] 13 Unknown 13 Unknown 0.0019 13.849 11562 0 14 Unknown 14 Unknown 0.0023 37.338 5850 0 15 Unknown 15 Unknown 0.0023 9.094 4081 0 [...] 19 Unknown 19 Unknown 0.0032 12.820 20 Unknown 20 Unknown 0.0019 37.337 0 5846 21 Unknown 21 Unknown 0.0042 18.210 0 1516 22 Unknown 22 Unknown 0.0018 9.082 0 0 23 Unknown 23 Unknown 0.0018 10.933 0 0 [...] 31 Unknown 31 Unknown 0.0028 16.402 0 0 32 Unknown 32 Unknown 0.0024 10.919 0 0 [...] 50 Unknown 50 Unknown 0.0023 12.818 0 0 51 Unknown 51 Unknown 0.0024 13.770 0 0 52 Unknown 52 Unknown 0.0019 12.819 0 0 53 Unknown 53 Unknown 0.0028 12.039 0 0 54 Unknown 54 Unknown 0.0070 13.301 0 0
I see two main issues from this table.
1) there are cases where two peaks that the algorhythm considered as different are obviously the same molecule (same retention time + similar spectrum). For example peak #15 and #22 (below). My hypothesis is that variations in peak intensity and in background intensity result in statistically different pseudospectra. I suppose that making the peak matching conditions less stringent would help but I couldn't find out which parameters do that.
2) way more problematic, peaks that I know are there and relatively intense are not even detected. For example, peak #19 corresponds to limonene, which is present in every single sample. I have no trouble detecting it when looking at any of my samples, but it is reported in the result table only from half of them (even after correcting for problem 1)! This worries me a lot since there are many compounds in those samples that are much smaller and more difficult to detect than limonene. But again, I couldnt find out which parameters could possibly improve peak detection.
So here come my questions:
-Can anyone recommend a user friendly document that clearly explains what the parameters in xcms and metaMS are and what they do?
-(and/or) Can anyone recommend a set of parameters suitable for me ? (I work on a GC-MS quadrupole, scanning m/z 29–400, 3.8 scans.s–1, usual peak width 0.1 min, retention time variation <0.1 min, please let me know if important stuff are missing)
-any idea how powerful/accurate metaMS is on this kind of data, especially for the detection of small peaks? (GC-MS, volatile compound profiles)
-any suggestion to solve the issues pointed out above?
Thank you in advance for you advice and suggestions
Lucie