Dear All,
I was hoping you could help me with a problem I am having. I have run RRBS paired-end methyl-sequencing, and am currently analysing the data via an R pipeline. I am searching for Differentially Methylated Regions (DMRs) between 2 groups, analysing by sex, treatment, and age.
I have been using the Bioconductor BiSeq Package. In the manual, it says to perform the function: ‘trimClusters’ to remove the regions which aren’t differentially methylated, to leave me with my final result. But the following error appears:
> locCor.Mcord.all.3 <- estLocCor(vario.sm.Mcord.all.3)
(warning of NaNs produced, but the function seems to allow the pipeline to progress, despite the NaNs).
> clusters.rej.Mcord.all.3 <- testClusters(locCor.Mcord.all.3, FDR.cluster = 0.05)
> clusters.trimmed.Mcord.all.3 <- trimClusters(clusters.rej.Mcord.all.3, FDR.loc = 0.05)
Error in integrate(integrand, lower = z.li[loc], upper = Inf) :
non-finite function value
In addition: Warning messages:
1: In sqrt(1 - rho.li^2) : NaNs produced
2: In sqrt(1 - rho.li^2) : NaNs produced
I have analysed my clusters.rej. This has a few components, one of which is ‘clusters.rej.Mcord.all.3$clusters.reject’. This has no Na values that I can spot. Other components, such as ‘clusters.rej.Mcord.all.3$clusters.not.reject’ do have rows with p values of ‘Na’, which I have tried my best to remove.
I have 2 questions:
1) What is the function of ‘trimClusters’? I thought it just meant removing all the clusters that weren’t rejected, leaving those that were.
2) Why am I getting this error message? My supervisor said that perhaps there were CpG regions that didn’t have a high coverage during the initial sequencing, which therefore gave errors in downstream processing. If this is the case, how could I correct this? Failing this, is there a way to remove all the NaNs in the ‘clusters.rej’ to allow the subsequent step to progress?
What could have possibly caused this error?
Please let me know if you require any further information. I am happy to send you, for example, my R pipeline if necessary.
Thanks for your help!
Manu
Part 2 BA Undergraduate Student
University of Cambridge
Email: kms68@cam.ac.uk
My sessionInfo() is:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiSeq_1.10.0 Formula_1.2-1 SummarizedExperiment_1.0.1
[4] Biobase_2.30.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1
[7] IRanges_2.4.4 S4Vectors_0.8.3 BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] futile.logger_1.4.1 XVector_0.10.0 bitops_1.0-6 futile.options_1.0.0
[5] tools_3.2.2 zlibbioc_1.16.0 RSQLite_1.0.0 annotate_1.48.0
[9] lattice_0.20-33 DBI_0.3.1 rtracklayer_1.30.1 Biostrings_2.38.2
[13] lmtest_0.9-34 grid_3.2.2 nnet_7.3-11 globaltest_5.24.0
[17] flexmix_2.3-13 AnnotationDbi_1.32.0 survival_2.38-3 XML_3.98-1.3
[21] BiocParallel_1.4.0 lokern_1.1-5 lambda.r_1.1.7 splines_3.2.2
[25] Rsamtools_1.22.0 modeltools_0.2-21 sfsmisc_1.0-28 GenomicAlignments_1.6.1
[29] xtable_1.8-0 betareg_3.0-5 sandwich_2.3-4 RCurl_1.95-4.7
[33] zoo_1.7-12
Dear Katja,
Thank you so much for your swift reply, and for your explanation. I have been reading up about variograms, and have attached a representative one of mine as a URL: http://tinypic.com/r/15guj60/9
These were the steps I performed to create the variogram:
vario.Cordplacebo.all.3 <- makeVariogram(betaResultsNull.Cordplacebo.all.3)
plot(vario.Cordplacebo.all.3$variogram$v)
vario.sm.Cordplacebo.all.3 <- smoothVariogram(vario.Cordplacebo.all.3, sill = 1.15)
lines(vario.sm.Cordplacebo.all.3$variogram[,c("h", "v.sm")], col = "red", lwd = 1.5)
grid()
The variogram seems ok (to me at least), perhaps the concerning thing is that there is not as much variation between the points as there is in the example variogram in the BiSeq manual. But I don't have much experience of variograms and haven't been able to spot anything else odd.
I have been trying to maniupulate my R script but I am genuinely quite confused about what I'm doing wrong. Please let me know if you want me to send me further information.
Regards,
Manu