Getting wrong rownames of MSet object when using pfilter from watermelon
1
0
Entering edit mode
@anne-kristinstavrum-13056
Last seen 5.1 years ago

I use minfi to import my EPIC data, and use pfilter from wateRmelon as the next step to remove probes and samples with high detection p-values and low bead counts etc.

I submit the RGset object to pfilter and get an Mset object back.

Mset.pf <- pfilter(RGset, perCount = 5, pnthresh = 0.01, perc = 1, pthresh = 1)

When I do this on my computer, the rownames of the Mset.pf object are of the form cg05575921, but if i do the same on another computer i get e.g. 1600101, which I think may be some address keys from illumina.

Does anyone know why i get these other numbers? Or how I can convert them to cg-numbers?

I can see that the Illumina Manifest file gets loaded...

Kind regards, Anne-Kristin

Ps my computer is not powerful enough to do the preprocessing on the full dataset, so this is why I need it to work on another computer...

wateRmelon Illumina EPIC pfilter • 1.3k views
ADD COMMENT
0
Entering edit mode

Thank you so much for your help. Session info for the system where I get the weird numbers are below I have close to 1300 samples, and will definitely try bigmelon.

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS

Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LCCTYPE=enDK.UTF-8 LCNUMERIC=C [3] LCTIME=enDK.UTF-8 LCCOLLATE=enDK.UTF-8 [5] LCMONETARY=nbNO.UTF-8 LCMESSAGES=enDK.UTF-8 [7] LCPAPER=nbNO.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C [11] LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 [2] wateRmelon1.30.0 [3] illuminaio0.28.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [5] ROC1.62.0 [6] lumi2.38.0 [7] methylumi2.32.0 [8] FDb.InfiniumMethylation.hg192.2.0 [9] org.Hs.eg.db3.10.0 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 [11] GenomicFeatures1.38.0 [12] AnnotationDbi1.48.0 [13] ggplot23.2.1 [14] reshape21.4.3 [15] scales1.1.0 [16] limma3.42.0 [17] minfi1.32.0 [18] bumphunter1.28.0 [19] locfit1.5-9.1 [20] iterators1.0.12 [21] foreach1.4.7 [22] Biostrings2.54.0 [23] XVector0.26.0 [24] SummarizedExperiment1.16.0 [25] DelayedArray0.12.0 [26] BiocParallel1.20.0 [27] matrixStats0.55.0 [28] Biobase2.46.0 [29] GenomicRanges1.38.0 [30] GenomeInfoDb1.22.0 [31] IRanges2.20.0 [32] S4Vectors0.24.0 [33] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.60.0 mclust5.4.5 [4] base642.0 affyio1.56.0 bit640.9-7 [7] xml21.2.2 codetools0.2-16 splines3.6.1 [10] scrime1.3.5 knitr1.26 zeallot0.1.0 [13] Rsamtools2.2.1 annotate1.64.0 dbplyr1.4.2 [16] HDF5Array1.14.0 BiocManager1.30.10 readr1.3.1 [19] compiler3.6.1 httr1.4.1 backports1.1.5 [22] assertthat0.2.1 Matrix1.2-17 lazyeval0.2.2 [25] prettyunits1.0.2 tools3.6.1 affy1.64.0 [28] gtable0.3.0 glue1.3.1 GenomeInfoDbData1.2.2 [31] dplyr0.8.3 rappdirs0.3.1 doRNG1.7.1 [34] Rcpp1.0.3 vctrs0.2.0 multtest2.42.0 [37] preprocessCore1.48.0 nlme3.1-142 rtracklayer1.46.0 [40] DelayedMatrixStats1.8.0 xfun0.11 stringr1.4.0 [43] lifecycle0.1.0 rngtools1.4 XML3.98-1.20 [46] beanplot1.2 nleqslv3.3.2 zlibbioc1.32.0 [49] MASS7.3-51.4 hms0.5.2 rhdf52.30.0 [52] GEOquery2.54.0 RColorBrewer1.1-2 curl4.2 [55] memoise1.1.0 pkgmaker0.27 biomaRt2.42.0 [58] reshape0.8.8 stringi1.4.3 RSQLite2.1.2 [61] genefilter1.68.0 bibtex0.4.2 rlang0.4.1 [64] pkgconfig2.0.3 bitops1.0-6 nor1mix1.3-0 [67] lattice0.20-38 purrr0.3.3 Rhdf5lib1.8.0 [70] GenomicAlignments1.22.1 bit1.1-14 tidyselect0.2.5 [73] plyr1.8.4 magrittr1.5 R62.4.1 [76] DBI1.0.0 pillar1.4.2 withr2.1.2 [79] mgcv1.8-31 survival3.1-7 RCurl1.95-4.12 [82] tibble2.1.3 crayon1.3.4 KernSmooth2.23-16 [85] BiocFileCache1.10.2 progress1.2.2 grid3.6.1 [88] data.table1.12.6 blob1.2.0 digest0.6.22 [91] xtable1.8-4 tidyr1.0.0 openssl1.4.1 [94] munsell0.5.0 registry0.5-1 askpass1.1 [97] quadprog_1.5-7

ADD REPLY
0
Entering edit mode

Session info for my computer where I get the expected cg-numbers:

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.6 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] LCCTYPE=enGB.UTF-8 LCNUMERIC=C LCTIME=enGB.UTF-8 LCCOLLATE=enGB.UTF-8 LCMONETARY=nbNO.UTF-8 LCMESSAGES=enGB.UTF-8 LCPAPER=nbNO.UTF-8 [8] LCNAME=C LCADDRESS=C LCTELEPHONE=C LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 wateRmelon1.28.0 illuminaio0.26.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 ROC1.60.0 lumi2.36.0 [7] methylumi2.30.0 FDb.InfiniumMethylation.hg192.2.0 org.Hs.eg.db3.8.2 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 GenomicFeatures1.36.1 AnnotationDbi1.46.1 [13] ggplot23.2.1 reshape21.4.3 scales1.0.0 [16] limma3.40.6 minfi1.30.0 bumphunter1.26.0 [19] locfit1.5-9.1 iterators1.0.10 foreach1.4.4 [22] Biostrings2.52.0 XVector0.24.0 SummarizedExperiment1.14.0 [25] DelayedArray0.10.0 BiocParallel1.18.0 matrixStats0.54.0 [28] Biobase2.44.0 GenomicRanges1.36.1 GenomeInfoDb1.20.0 [31] IRanges2.18.2 S4Vectors0.22.1 BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.58.0 mclust5.4.3 base642.0 rstudioapi0.10 affyio1.54.0 bit640.9-7 [8] xml21.2.0 codetools0.2-16 splines3.6.0 scrime1.3.5 zeallot0.1.0 Rsamtools2.0.0 annotate1.62.0 [15] HDF5Array1.12.1 BiocManager1.30.4 readr1.3.1 compiler3.6.0 httr1.4.0 backports1.1.4 assertthat0.2.1 [22] Matrix1.2-17 lazyeval0.2.2 prettyunits1.0.2 tools3.6.0 affy1.62.0 gtable0.3.0 glue1.3.1 [29] GenomeInfoDbData1.2.1 dplyr0.8.3 doRNG1.7.1 Rcpp1.0.2 vctrs0.2.0 multtest2.40.0 preprocessCore1.46.0 [36] nlme3.1-140 rtracklayer1.44.0 DelayedMatrixStats1.6.0 stringr1.4.0 lifecycle0.1.0 rngtools1.3.1.1 XML3.98-1.20 [43] beanplot1.2 nleqslv3.3.2 zlibbioc1.30.0 MASS7.3-51.4 hms0.5.1 rhdf52.28.0 GEOquery2.52.0 [50] RColorBrewer1.1-2 memoise1.1.0 pkgmaker0.27 biomaRt2.40.0 reshape0.8.8 stringi1.4.3 RSQLite2.1.1 [57] genefilter1.66.0 bibtex0.4.2 rlang0.4.0 pkgconfig2.0.3 bitops1.0-6 nor1mix1.2-3 lattice0.20-38 [64] purrr0.3.2 Rhdf5lib1.6.0 GenomicAlignments1.20.0 bit1.1-14 tidyselect0.2.5 plyr1.8.4 magrittr1.5 [71] R62.4.0 DBI1.0.0 pillar1.4.2 withr2.1.2 mgcv1.8-28 survival2.44-1.1 RCurl1.95-4.12 [78] tibble2.1.3 crayon1.3.4 KernSmooth2.23-15 progress1.2.2 grid3.6.0 data.table1.12.2 blob1.1.1 [85] digest0.6.19 xtable1.8-4 tidyr1.0.0 openssl1.4 munsell0.5.0 registry0.5-1 askpass1.1 [92] quadprog1.5-7

ADD REPLY
0
Entering edit mode

Dear Anne-Kristin:

There were a few changes in the most recent version of wateRmelon that we didn't communicate to users (or each other!) adequately. pfilter now returns a filtered rgchannelset instead of a methylset. The methylset is what you get from preprocessing (the rows then have illumina names and not just row numbers). To get the same thing as you had with the older versions, the object needs to go through preprocessRaw or a normaliser. Obviously, we recommend dasen (Pidsley 2013).

With your numbers of arrays, that's clearly a use case for bigmelon. Get in touch if you have any trouble using it.

best wishes

Leo

ADD REPLY
0
Entering edit mode

Thank you so much for clarifying! I want to remove the probes that are on the X and Y chromosomes before normalising the data, so will use preprocessRaw before proceeding.

Thanks again! Anne-Kristin

ADD REPLY
0
Entering edit mode
lschal • 0
@lschal-10014
Last seen 8.7 years ago
University of Essex

Dear Anne-Kristin:

We need a bit more information to understand what's going on here, could you send me the output of sessionInfo() from both computers (with the packages you are using loaded). Also, how many epic arrays are you working with? Possibly you could do everything on your own computer (if you wish to) using the bigmelon package, which doesn't store everything in RAM.

Leo

ADD COMMENT
0
Entering edit mode

Thank you so much for your help. Session info for the system where I get the weird numbers are below I have close to 1300 samples, and will definitely try bigmelon.

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS

Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LCCTYPE=enDK.UTF-8 LCNUMERIC=C [3] LCTIME=enDK.UTF-8 LCCOLLATE=enDK.UTF-8 [5] LCMONETARY=nbNO.UTF-8 LCMESSAGES=enDK.UTF-8 [7] LCPAPER=nbNO.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C [11] LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 [2] wateRmelon1.30.0 [3] illuminaio0.28.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [5] ROC1.62.0 [6] lumi2.38.0 [7] methylumi2.32.0 [8] FDb.InfiniumMethylation.hg192.2.0 [9] org.Hs.eg.db3.10.0 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 [11] GenomicFeatures1.38.0
[12] AnnotationDbi
1.48.0
[13] ggplot23.2.1
[14] reshape2
1.4.3
[15] scales1.1.0
[16] limma
3.42.0
[17] minfi1.32.0
[18] bumphunter
1.28.0
[19] locfit1.5-9.1
[20] iterators
1.0.12
[21] foreach1.4.7
[22] Biostrings
2.54.0
[23] XVector0.26.0
[24] SummarizedExperiment
1.16.0
[25] DelayedArray0.12.0
[26] BiocParallel
1.20.0
[27] matrixStats0.55.0
[28] Biobase
2.46.0
[29] GenomicRanges1.38.0
[30] GenomeInfoDb
1.22.0
[31] IRanges2.20.0
[32] S4Vectors
0.24.0
[33] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.60.0 mclust5.4.5 [4] base642.0 affyio1.56.0 bit640.9-7 [7] xml21.2.2 codetools0.2-16 splines3.6.1 [10] scrime1.3.5 knitr1.26 zeallot0.1.0 [13] Rsamtools2.2.1 annotate1.64.0 dbplyr1.4.2 [16] HDF5Array1.14.0 BiocManager1.30.10 readr1.3.1 [19] compiler3.6.1 httr1.4.1 backports1.1.5 [22] assertthat0.2.1 Matrix1.2-17 lazyeval0.2.2 [25] prettyunits1.0.2 tools3.6.1 affy1.64.0 [28] gtable0.3.0 glue1.3.1 GenomeInfoDbData1.2.2 [31] dplyr0.8.3 rappdirs0.3.1 doRNG1.7.1 [34] Rcpp1.0.3 vctrs0.2.0 multtest2.42.0 [37] preprocessCore1.48.0 nlme3.1-142 rtracklayer1.46.0 [40] DelayedMatrixStats1.8.0 xfun0.11 stringr1.4.0 [43] lifecycle0.1.0 rngtools1.4 XML3.98-1.20 [46] beanplot1.2 nleqslv3.3.2 zlibbioc1.32.0 [49] MASS7.3-51.4 hms0.5.2 rhdf52.30.0 [52] GEOquery2.54.0 RColorBrewer1.1-2 curl4.2 [55] memoise1.1.0 pkgmaker0.27 biomaRt2.42.0 [58] reshape0.8.8 stringi1.4.3 RSQLite2.1.2 [61] genefilter1.68.0 bibtex0.4.2 rlang0.4.1 [64] pkgconfig2.0.3 bitops1.0-6 nor1mix1.3-0 [67] lattice0.20-38 purrr0.3.3 Rhdf5lib1.8.0 [70] GenomicAlignments1.22.1 bit1.1-14 tidyselect0.2.5 [73] plyr1.8.4 magrittr1.5 R62.4.1 [76] DBI1.0.0 pillar1.4.2 withr2.1.2 [79] mgcv1.8-31 survival3.1-7 RCurl1.95-4.12 [82] tibble2.1.3 crayon1.3.4 KernSmooth2.23-16 [85] BiocFileCache1.10.2 progress1.2.2 grid3.6.1 [88] data.table1.12.6 blob1.2.0 digest0.6.22 [91] xtable1.8-4 tidyr1.0.0 openssl1.4.1 [94] munsell0.5.0 registry0.5-1 askpass1.1 [97] quadprog_1.5-7

ADD REPLY
0
Entering edit mode

Session info for my computer where I get the expected cg-numbers:

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.6 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] LCCTYPE=enGB.UTF-8 LCNUMERIC=C LCTIME=enGB.UTF-8 LCCOLLATE=enGB.UTF-8 LCMONETARY=nbNO.UTF-8 LCMESSAGES=enGB.UTF-8 LCPAPER=nbNO.UTF-8
[8] LC
NAME=C LCADDRESS=C LCTELEPHONE=C LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 wateRmelon1.28.0 illuminaio0.26.0
[4] IlluminaHumanMethylation450kanno.ilmn12.hg19
0.6.0 ROC1.60.0 lumi2.36.0
[7] methylumi2.30.0 FDb.InfiniumMethylation.hg192.2.0 org.Hs.eg.db3.8.2
[10] TxDb.Hsapiens.UCSC.hg19.knownGene
3.2.2 GenomicFeatures1.36.1 AnnotationDbi1.46.1
[13] ggplot23.2.1 reshape21.4.3 scales1.0.0
[16] limma
3.40.6 minfi1.30.0 bumphunter1.26.0
[19] locfit1.5-9.1 iterators1.0.10 foreach1.4.4
[22] Biostrings
2.52.0 XVector0.24.0 SummarizedExperiment1.14.0
[25] DelayedArray0.10.0 BiocParallel1.18.0 matrixStats0.54.0
[28] Biobase
2.44.0 GenomicRanges1.36.1 GenomeInfoDb1.20.0
[31] IRanges2.18.2 S4Vectors0.22.1 BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.58.0 mclust5.4.3 base642.0 rstudioapi0.10 affyio1.54.0 bit640.9-7
[8] xml2
1.2.0 codetools0.2-16 splines3.6.0 scrime1.3.5 zeallot0.1.0 Rsamtools2.0.0 annotate1.62.0
[15] HDF5Array1.12.1 BiocManager1.30.4 readr1.3.1 compiler3.6.0 httr1.4.0 backports1.1.4 assertthat0.2.1
[22] Matrix
1.2-17 lazyeval0.2.2 prettyunits1.0.2 tools3.6.0 affy1.62.0 gtable0.3.0 glue1.3.1
[29] GenomeInfoDbData1.2.1 dplyr0.8.3 doRNG1.7.1 Rcpp1.0.2 vctrs0.2.0 multtest2.40.0 preprocessCore1.46.0
[36] nlme
3.1-140 rtracklayer1.44.0 DelayedMatrixStats1.6.0 stringr1.4.0 lifecycle0.1.0 rngtools1.3.1.1 XML3.98-1.20
[43] beanplot1.2 nleqslv3.3.2 zlibbioc1.30.0 MASS7.3-51.4 hms0.5.1 rhdf52.28.0 GEOquery2.52.0
[50] RColorBrewer
1.1-2 memoise1.1.0 pkgmaker0.27 biomaRt2.40.0 reshape0.8.8 stringi1.4.3 RSQLite2.1.1
[57] genefilter1.66.0 bibtex0.4.2 rlang0.4.0 pkgconfig2.0.3 bitops1.0-6 nor1mix1.2-3 lattice0.20-38
[64] purrr
0.3.2 Rhdf5lib1.6.0 GenomicAlignments1.20.0 bit1.1-14 tidyselect0.2.5 plyr1.8.4 magrittr1.5
[71] R62.4.0 DBI1.0.0 pillar1.4.2 withr2.1.2 mgcv1.8-28 survival2.44-1.1 RCurl1.95-4.12
[78] tibble
2.1.3 crayon1.3.4 KernSmooth2.23-15 progress1.2.2 grid3.6.0 data.table1.12.2 blob1.1.1
[85] digest0.6.19 xtable1.8-4 tidyr1.0.0 openssl1.4 munsell0.5.0 registry0.5-1 askpass1.1
[92] quadprog
1.5-7

ADD REPLY

Login before adding your answer.

Traffic: 455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6