I use minfi to import my EPIC data, and use pfilter from wateRmelon as the next step to remove probes and samples with high detection p-values and low bead counts etc.
I submit the RGset object to pfilter and get an Mset object back.
Mset.pf <- pfilter(RGset, perCount = 5, pnthresh = 0.01, perc = 1, pthresh = 1)
When I do this on my computer, the rownames of the Mset.pf object are of the form cg05575921, but if i do the same on another computer i get e.g. 1600101, which I think may be some address keys from illumina.
Does anyone know why i get these other numbers? Or how I can convert them to cg-numbers?
I can see that the Illumina Manifest file gets loaded...
Kind regards, Anne-Kristin
Ps my computer is not powerful enough to do the preprocessing on the full dataset, so this is why I need it to work on another computer...
Thank you so much for your help. Session info for the system where I get the weird numbers are below I have close to 1300 samples, and will definitely try bigmelon.
Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1
locale: [1] LCCTYPE=enDK.UTF-8 LCNUMERIC=C [3] LCTIME=enDK.UTF-8 LCCOLLATE=enDK.UTF-8 [5] LCMONETARY=nbNO.UTF-8 LCMESSAGES=enDK.UTF-8 [7] LCPAPER=nbNO.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C [11] LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base
other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 [2] wateRmelon1.30.0 [3] illuminaio0.28.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [5] ROC1.62.0 [6] lumi2.38.0 [7] methylumi2.32.0 [8] FDb.InfiniumMethylation.hg192.2.0 [9] org.Hs.eg.db3.10.0 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 [11] GenomicFeatures1.38.0 [12] AnnotationDbi1.48.0 [13] ggplot23.2.1 [14] reshape21.4.3 [15] scales1.1.0 [16] limma3.42.0 [17] minfi1.32.0 [18] bumphunter1.28.0 [19] locfit1.5-9.1 [20] iterators1.0.12 [21] foreach1.4.7 [22] Biostrings2.54.0 [23] XVector0.26.0 [24] SummarizedExperiment1.16.0 [25] DelayedArray0.12.0 [26] BiocParallel1.20.0 [27] matrixStats0.55.0 [28] Biobase2.46.0 [29] GenomicRanges1.38.0 [30] GenomeInfoDb1.22.0 [31] IRanges2.20.0 [32] S4Vectors0.24.0 [33] BiocGenerics_0.32.0
loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.60.0 mclust5.4.5 [4] base642.0 affyio1.56.0 bit640.9-7 [7] xml21.2.2 codetools0.2-16 splines3.6.1 [10] scrime1.3.5 knitr1.26 zeallot0.1.0 [13] Rsamtools2.2.1 annotate1.64.0 dbplyr1.4.2 [16] HDF5Array1.14.0 BiocManager1.30.10 readr1.3.1 [19] compiler3.6.1 httr1.4.1 backports1.1.5 [22] assertthat0.2.1 Matrix1.2-17 lazyeval0.2.2 [25] prettyunits1.0.2 tools3.6.1 affy1.64.0 [28] gtable0.3.0 glue1.3.1 GenomeInfoDbData1.2.2 [31] dplyr0.8.3 rappdirs0.3.1 doRNG1.7.1 [34] Rcpp1.0.3 vctrs0.2.0 multtest2.42.0 [37] preprocessCore1.48.0 nlme3.1-142 rtracklayer1.46.0 [40] DelayedMatrixStats1.8.0 xfun0.11 stringr1.4.0 [43] lifecycle0.1.0 rngtools1.4 XML3.98-1.20 [46] beanplot1.2 nleqslv3.3.2 zlibbioc1.32.0 [49] MASS7.3-51.4 hms0.5.2 rhdf52.30.0 [52] GEOquery2.54.0 RColorBrewer1.1-2 curl4.2 [55] memoise1.1.0 pkgmaker0.27 biomaRt2.42.0 [58] reshape0.8.8 stringi1.4.3 RSQLite2.1.2 [61] genefilter1.68.0 bibtex0.4.2 rlang0.4.1 [64] pkgconfig2.0.3 bitops1.0-6 nor1mix1.3-0 [67] lattice0.20-38 purrr0.3.3 Rhdf5lib1.8.0 [70] GenomicAlignments1.22.1 bit1.1-14 tidyselect0.2.5 [73] plyr1.8.4 magrittr1.5 R62.4.1 [76] DBI1.0.0 pillar1.4.2 withr2.1.2 [79] mgcv1.8-31 survival3.1-7 RCurl1.95-4.12 [82] tibble2.1.3 crayon1.3.4 KernSmooth2.23-16 [85] BiocFileCache1.10.2 progress1.2.2 grid3.6.1 [88] data.table1.12.6 blob1.2.0 digest0.6.22 [91] xtable1.8-4 tidyr1.0.0 openssl1.4.1 [94] munsell0.5.0 registry0.5-1 askpass1.1 [97] quadprog_1.5-7
Session info for my computer where I get the expected cg-numbers:
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale: [1] LCCTYPE=enGB.UTF-8 LCNUMERIC=C LCTIME=enGB.UTF-8 LCCOLLATE=enGB.UTF-8 LCMONETARY=nbNO.UTF-8 LCMESSAGES=enGB.UTF-8 LCPAPER=nbNO.UTF-8 [8] LCNAME=C LCADDRESS=C LCTELEPHONE=C LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 wateRmelon1.28.0 illuminaio0.26.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 ROC1.60.0 lumi2.36.0 [7] methylumi2.30.0 FDb.InfiniumMethylation.hg192.2.0 org.Hs.eg.db3.8.2 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 GenomicFeatures1.36.1 AnnotationDbi1.46.1 [13] ggplot23.2.1 reshape21.4.3 scales1.0.0 [16] limma3.40.6 minfi1.30.0 bumphunter1.26.0 [19] locfit1.5-9.1 iterators1.0.10 foreach1.4.4 [22] Biostrings2.52.0 XVector0.24.0 SummarizedExperiment1.14.0 [25] DelayedArray0.10.0 BiocParallel1.18.0 matrixStats0.54.0 [28] Biobase2.44.0 GenomicRanges1.36.1 GenomeInfoDb1.20.0 [31] IRanges2.18.2 S4Vectors0.22.1 BiocGenerics_0.30.0
loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.58.0 mclust5.4.3 base642.0 rstudioapi0.10 affyio1.54.0 bit640.9-7 [8] xml21.2.0 codetools0.2-16 splines3.6.0 scrime1.3.5 zeallot0.1.0 Rsamtools2.0.0 annotate1.62.0 [15] HDF5Array1.12.1 BiocManager1.30.4 readr1.3.1 compiler3.6.0 httr1.4.0 backports1.1.4 assertthat0.2.1 [22] Matrix1.2-17 lazyeval0.2.2 prettyunits1.0.2 tools3.6.0 affy1.62.0 gtable0.3.0 glue1.3.1 [29] GenomeInfoDbData1.2.1 dplyr0.8.3 doRNG1.7.1 Rcpp1.0.2 vctrs0.2.0 multtest2.40.0 preprocessCore1.46.0 [36] nlme3.1-140 rtracklayer1.44.0 DelayedMatrixStats1.6.0 stringr1.4.0 lifecycle0.1.0 rngtools1.3.1.1 XML3.98-1.20 [43] beanplot1.2 nleqslv3.3.2 zlibbioc1.30.0 MASS7.3-51.4 hms0.5.1 rhdf52.28.0 GEOquery2.52.0 [50] RColorBrewer1.1-2 memoise1.1.0 pkgmaker0.27 biomaRt2.40.0 reshape0.8.8 stringi1.4.3 RSQLite2.1.1 [57] genefilter1.66.0 bibtex0.4.2 rlang0.4.0 pkgconfig2.0.3 bitops1.0-6 nor1mix1.2-3 lattice0.20-38 [64] purrr0.3.2 Rhdf5lib1.6.0 GenomicAlignments1.20.0 bit1.1-14 tidyselect0.2.5 plyr1.8.4 magrittr1.5 [71] R62.4.0 DBI1.0.0 pillar1.4.2 withr2.1.2 mgcv1.8-28 survival2.44-1.1 RCurl1.95-4.12 [78] tibble2.1.3 crayon1.3.4 KernSmooth2.23-15 progress1.2.2 grid3.6.0 data.table1.12.2 blob1.1.1 [85] digest0.6.19 xtable1.8-4 tidyr1.0.0 openssl1.4 munsell0.5.0 registry0.5-1 askpass1.1 [92] quadprog1.5-7
Dear Anne-Kristin:
There were a few changes in the most recent version of wateRmelon that we didn't communicate to users (or each other!) adequately. pfilter now returns a filtered rgchannelset instead of a methylset. The methylset is what you get from preprocessing (the rows then have illumina names and not just row numbers). To get the same thing as you had with the older versions, the object needs to go through preprocessRaw or a normaliser. Obviously, we recommend dasen (Pidsley 2013).
With your numbers of arrays, that's clearly a use case for bigmelon. Get in touch if you have any trouble using it.
best wishes
Leo
Thank you so much for clarifying! I want to remove the probes that are on the X and Y chromosomes before normalising the data, so will use preprocessRaw before proceeding.
Thanks again! Anne-Kristin