Hello,
I have a large dataset of MS and MSMS data on crude chemical extracts from tropical tree leaves. I have successfully created and run an XCMS pipeline that has allowed us to classify and identify several compounds within these samples that have been run on a 150mm C18 column using reverse phase chromatography. However, when I attempt to use our pipeline for the same samples instead run on a 100mm amide column using normal phase chromatography, I am met with a fatal error that occurs within the XCMS function adjustRtime():
Sample number 4 used as center sample.
Aligning BN_amide_090.mzXML against DN1620_polar.mzXML ... Found gaps in scan times of the center sample: cut scantime-vector at NA seconds.
Found gaps in scan time of file BN_amide_090.mzXML: cut scantime-vector at NA seconds.
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6
first error: argument is of length zero
Before this step, we have a step that uses the XCMS function findChromPeaks() which outputted warning messages that no peaks were found in samples 1 or 2, corresponding to our two associated blank runs (used to make sure we are not considering any potential contaminants as part of our samples). I attempted to adjust the CentWaveParam() for this function, but was still unable to pick up peaks in these blanks. I thought these warnings were contributing to the adjustRtime() fatal error, so I subsetted the data object to exclude the empty blanks, but even so the adjustRtime() error still occurred.
The code where the error occurs is a part of a for loop that loops through the five sites and 173 plant unique species x site combinations in our dataset. We are running this peak picking step on a Linux node as it takes a substantial amount of memory and computing power for a dataset as large as ours. In context, with the error line bolded:
setwd("/uufs/chpc.utah.edu/common/home/inga-group1/4_directories_chem_evolution_2019_03_26/")
# Define absolute path to mzxml files, ending with folder containing folders named by site
mzxml_location <- "/uufs/chpc.utah.edu/common/home/inga-group1/4_directories_chem_evolution_2019_03_26/polar/"
# set retention time cutoffs (in minutes) for peaks that should be kept in analysis (typically after 1 minute and before 32 minutes
# for C18 column standard run)
min_rt <- 1.1
max_rt <- 10.5
# This code assumes .mzXML files are arranged into folders by species, with species being arranged in folders by site.
# Within each species folder, there should be two folders, one named "Sample" containing all samples for that species
# And the other named "Blank" containing all blanks for that species.
# Create vector with all sites you wish to analyze
sites <- c("BCI","Manaus","LA","FG","Tiputini")
####### Loop through directories containing .mzXML files. Samples and their associated blanks should be in separate folders named "Sample" and "Blank". Depending on how files are arranged, you may or may not need to use the outer loop for site #######
for(i in 1:length(sites)) {
species <- list.dirs(paste(mzxml_location,sites[i],sep=""), recursive=FALSE, full.names=FALSE)
for(j in 1:length(species)) {
print(species[j])
# make vector with relative filepaths (used by XCMS to find files), sample groups (used by XCMS to group files into blank and sample), and sample names (used by XCMS in peak list output)
files_1 <- list.files(paste(mzxml_location, sites[i], "/",species[j],sep=""),
recursive = TRUE, full.names = TRUE)
files_2 <- files_1[endsWith(files_1, "mzXML")]
files <- files_2[!grepl("Standard", files_2)]
s_groups <- sapply(files, function(x) {
temp = unlist(strsplit(x, split = "/"))
return(temp[length(temp)-1])
})
is.na(s_groups) <- 0
s_names <- sapply(files, function(x) {
temp = unlist(strsplit(x, split = "/"))
return(sub(temp[length(temp)], pattern = ".mzXML", replacement = "", fixed = TRUE))
})
# set parameters for XCMS functions
# centwave is the method used to perform initial peak picking/finding - THIS HAS CHANGED FROM THE C18 METHOD PARAMETERS
cwparam <- CentWaveParam(ppm=15, peakwidth=c(0.2,5), snthresh=5, prefilter=c(1,500))
# peakdensity is the method used to group peaks across samples based on retention time. For the first peak grouping step, use more lenient parameters because retention time has not yet been corrected
dparam1 <- PeakDensityParam(sampleGroups = s_groups, bw=10, binSize=0.05, minSamples=1, minFraction = 0.01)
# Obiwarp is the method used for retention time correction
oparam <- ObiwarpParam(binSize=1)
# second round of peak grouping uses more stringent parameters
dparam2 <- PeakDensityParam(sampleGroups = s_groups, bw = 3, binSize = 0.025, minSamples = 1, minFraction = 0.01)
# FillChromPeaks look for all peaks in all samples to find peaks that were below threshold set during peak picking
fpparam <- FillChromPeaksParam(expandMz = 0.25, expandRt = 0.5)
# reads mzXML files into OnDiskMSnExp object
raw_data_polar <- readMSData(files, msLevel. = 1, mode="onDisk")
# Run XCMS functions with parameters set above
xcmsexp <- findChromPeaks(object = raw_data_polar, param = cwparam)
xcmsexp <- groupChromPeaks(object = xcmsexp, param = dparam1)
**xcmsexp <- adjustRtime(object = xcmsexp, param = oparam)**
xcmsexp <- groupChromPeaks(object = xcmsexp, param = dparam2)
xcmsexp <- fillChromPeaks(xcmsexp, param = fpparam)
I checked the raw files, and found that the data run on the amide column are comparable in quality to those run on our C18 column. I also ran a histogram of the retention times for the amide column data and found no apparent gaps. In fact, there appears to be no significant difference in the XCMS experiment object data, accessed as follows:
raw_object@featureData@data
Additionally, I use the same ObiWarpParam() for both C18 and amide analysis where binSize=1.
Because the workflow works for the C18 and not for the amide column, I’m still suspicious of the amide data as the source of the issue. However, I am not sure where to look to determine why the fatal adjustRtime() error is occurring. This issue occurs consistently for all samples in the amide column dataset, so it is not an issue with a single file conversion/upload/etc. The methods for C18 and amide analysis also specify identical scan time parameters, so not sure why the scan time is only an issue for the latter data.
For reference, all of our files are in .mzXML format and have been converted from .raw using msconvert.exe in Python.
Is this error occurring because I should be treating this data differently due to the difference in chromatography between the amide (normal phase) and C18 (reverse phase) column? Where else might be good to look at the data to find the source of the fatal error?
Any help or advice with this would be greatly appreciated, as I am unsure how to proceed and our amide column data is vital to the completion of this project.
Thanks in advance. Abbey
R sessionInfo() copied below:
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] RMySQL_0.10.17 DBI_1.0.0 reshape2_1.4.3 CAMERA_1.38.1 splitstackshape_1.4.8
[6] vegan_2.5-5 lattice_0.20-35 permute_0.9-5 xcms_3.4.4 MSnbase_2.8.3
[11] ProtGenerics_1.14.0 S4Vectors_0.20.1 mzR_2.16.2 Rcpp_1.0.1 BiocParallel_1.16.6
[16] multtest_2.38.0 Biobase_2.42.0 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] nlme_3.1-137 doParallel_1.0.14 RColorBrewer_1.1-2 tools_3.5.1
[5] backports_1.1.4 R6_2.4.0 affyio_1.52.0 rpart_4.1-13
[9] Hmisc_4.2-0 lazyeval_0.2.2 mgcv_1.8-24 colorspace_1.4-1
[13] nnet_7.3-12 tidyselect_0.2.5 gridExtra_2.3 compiler_3.5.1
[17] MassSpecWavelet_1.48.1 preprocessCore_1.44.0 graph_1.60.0 htmlTable_1.13.1
[21] scales_1.0.0 checkmate_1.9.3 DEoptimR_1.0-8 robustbase_0.93-5
[25] affy_1.60.0 RBGL_1.58.2 stringr_1.4.0 digest_0.6.19
[29] foreign_0.8-70 base64enc_0.1-3 pkgconfig_2.0.2 htmltools_0.3.6
[33] limma_3.38.3 htmlwidgets_1.3 rlang_0.4.0 rstudioapi_0.10
[37] impute_1.56.0 mzID_1.20.1 acepack_1.4.1 dplyr_0.8.3
[41] magrittr_1.5 Formula_1.2-3 MALDIquant_1.19.3 Matrix_1.2-14
[45] munsell_0.5.0 vsn_3.50.0 stringi_1.4.3 MASS_7.3-50
[49] zlibbioc_1.28.0 plyr_1.8.4 grid_3.5.1 crayon_1.3.4
[53] splines_3.5.1 knitr_1.23 pillar_1.4.1 igraph_1.2.4.1
[57] codetools_0.2-15 XML_3.98-1.20 glue_1.3.1 latticeExtra_0.6-28
[61] pcaMethods_1.74.0 data.table_1.12.2 BiocManager_1.30.4 foreach_1.4.4
[65] gtable_0.3.0 RANN_2.6.1 purrr_0.3.2 assertthat_0.2.1
[69] ggplot2_3.1.1 xfun_0.7 ncdf4_1.16.1 survival_2.42-3
[73] tibble_2.1.3 iterators_1.0.10 IRanges_2.16.0 cluster_2.0.7-1
Using traceback() for the fatal error:
Dear Abbey,
thank you for your detailed description of your issue! The error that you see comes from the obiwarp alignment. Without knowing your data it is now difficult to get to the exact cause, but I guess that obiwarp has a problem with the retention times in one file (e.g. because, as the error message says, there are larger gaps in the sequence of the retention times).
Possible solutions/suggestions could be: 1) If possible you should use a more recent version or R and xcms. The current stable version of
xcms
is 3.6.1 for R version 3.6. 2) If you use obiwarp for alignment you don't need the first peak grouping step (xcmsexp <- groupChromPeaks(object = xcmsexp, param = dparam1)
) - obiwarp aligns the spectra and does not need any identified chromatographic peaks. 3) Obiwarp can be quite picky with the provided data, alignment based on peak density can be more robust. So, you could use that method instead of obiwarp. You would then however need the additional peak grouping step after peak detection. 4) Since you mention that you have blanks you should use subset-based alignment as described in thexcms
vignette - that requires however thexcms
package version mentioned above. 5) I strongly suggest that you inspect also the data before/after correspondence analysis (i.e. peak grouping) - thebw
parameter that you use (10
and3
) is actually quite large for the peak widths you expect in your data (guessing from thepeakwidth = c(0.2, 5)
, i.e. between 0.2 and 5 seconds). Have a look at this workshop (or here for the github repository) for some hints and suggestion to define/tune parameter settings forxcms
.I hope this helps.
cheers, jo
Hi jo,
Thanks for the quick response about this issue. I have attempted to update both XCMS and R to their most recent versions (XCMS = 3.6.1, R = 3.6.1) on the linux machine we use for this analysis and found that there is a compiler issue with the mzR package on which XCMS is dependent. According to our computer scientists who run and maintain the linux machine we use, mzR was last updated 4 years ago and uses a rather outdated form of boost from C++. The compiler issue has been preventing us from updating the XCMS package. Additionally, when I try and make the update on my PC, the XCMS package can be updated and loaded from the library but package loading ends in a warning message:
The provided github link has not been updated in over 5 years, so not sure how relevant the fixes listed are for the issue.
Just wondering how the newest version of XCMS was built to avoid these compiler issues with mzR?
Thanks, Abbey
We were able to find a temporary fix to the mzR compiling issue and load the newest version of R 3.6.1 and XCMS 3.6.1. However, I am now getting this output after the initial peak picking step:
And then skipping the first peak grouping step:
Which results in the same error as before. The only changes I'm noticing are the
Error in x$.self$finalize() : attempt to apply non-function
- any ideas what could be causing this, or if it points to a different issue entirely?Any help would be appreciated.
Thanks, Abbey
Don't worry about the
x$.self$finalize()
calls - I did not find a way yet to get rid of them, but they don't affect the analysis. They occurr randomly.Now regarding the issue that you get, would it be possible for you to share the BN_amide_092.mzXML file and a second one of your files? It's hard to tell what causes your problem, but I guess it has to do with this one file. The info Found gaps in scan time of file BN_amide_092.mzXML: cut scantime-vector at NA seconds. tells that the retention times of spectra in this file are not equally spaced (that's what the Found gaps means). Usually a failback mechanism kicks in in such cases, but somehow it fails for your data.
cheers, jo
Good to know about the
x$.self$finalize()
calls. I can send you a few of the files we are working with - I do not htink it is an issue with just one file, as the same error occurs for all of the samples in this project. I can send you all of the samples for three species, as the xcms peak picking / peak alignment we are attempting occurs within each species (see above code).What's the best way for me to send you some of our files?
Could you please try to make a minimal example with the fewest files possible? you could send me an email to johannes.rainer at eurac.edu
Yes, I have sent you an email with two species and six corresponding files.
Hi, I have the same problem. Have you solved this? Thanks!
Hi, I have the same problem. Have you solved this? Thanks!