Entering edit mode
I have to analyse BeadChip Methylation Data. Since I don't have replicates in my experiment, I'm thinking of using the package 'DSS' for the analysis. This package takes data in count format for each CG position: chromosome number, genomic coordinate, total number of reads, and number of reads showing methylation, like:
chr pos N X
chr18 3014904 26 2
chr18 3031032 33 12
chr18 3031044 33 13
chr18 3031065 48 24
I could read the Illumina .idat files using the library 'illuminaio', which gives this result.
> library(illuminaio)
> idat <- readIDAT("205715840012_R01C01_Grn.idat")
> names(idat)
[1] "fileSize" "versionNumber" "nFields" "fields" "nSNPsRead" "Quants" "MidBlock"
[8] "RedGreen" "Barcode" "ChipType" "RunInfo" "Unknowns"
> idat$Quants[1:5,]
Mean SD NBeads
1600101 8827 870 20
1600111 2972 355 16
1600115 2550 484 16
1600123 1266 221 12
1600131 180 94 19
Now, I do not know how to covert this information to the above 'count data' information with chr, pos, N, X. Any help would be appreciated.
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] IlluminaDataTestFiles_1.30.0 illuminaio_0.34.0 DSS_2.40.0 bsseq_1.28.0
[5] SummarizedExperiment_1.22.0 MatrixGenerics_1.4.3 matrixStats_0.61.0 GenomicRanges_1.44.0
[9] GenomeInfoDb_1.28.4 IRanges_2.26.0 S4Vectors_0.30.2 BiocParallel_1.26.2
[13] Biobase_2.52.0 BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] base64_2.0 Rcpp_1.0.8 locfit_1.5-9.4 lattice_0.20-44 Rsamtools_2.8.0
[6] Biostrings_2.60.2 gtools_3.9.2 digest_0.6.29 R6_2.5.1 evaluate_0.14
[11] sparseMatrixStats_1.4.2 zlibbioc_1.38.0 rlang_1.0.1 rstudioapi_0.13 data.table_1.14.2
[16] jquerylib_0.1.4 R.utils_2.11.0 R.oo_1.24.0 Matrix_1.4-0 rmarkdown_2.11
[21] splines_4.1.0 stringr_1.4.0 RCurl_1.98-1.5 munsell_0.5.0 DelayedArray_0.18.0
[26] HDF5Array_1.20.0 compiler_4.1.0 rtracklayer_1.52.1 xfun_0.29 askpass_1.1
[31] htmltools_0.5.2 openssl_1.4.6 GenomeInfoDbData_1.2.6 XML_3.99-0.8 permute_0.9-7
[36] crayon_1.4.2 GenomicAlignments_1.28.0 bitops_1.0-7 rhdf5filters_1.4.0 R.methodsS3_1.8.1
[41] grid_4.1.0 jsonlite_1.7.3 lifecycle_1.0.1 magrittr_2.0.2 scales_1.1.1
[46] stringi_1.7.6 cli_3.1.1 XVector_0.32.0 limma_3.48.3 bslib_0.3.1
[51] DelayedMatrixStats_1.14.3 Rhdf5lib_1.14.2 rjson_0.2.21 restfulr_0.0.13 tools_4.1.0
[56] BSgenome_1.60.0 fastmap_1.1.0 yaml_2.2.2 colorspace_2.0-2 rhdf5_2.36.0
[61] BiocManager_1.30.16 knitr_1.37 sass_0.4.0 BiocIO_1.2.0
I don't think DSS is designed for array data but rather to analyse BS-seq data
Yes, I read that, but I thought it might be possible to convert the information to BS-seq format somehow.
Since methylation arrays rely on fluorescence signals data and not sequencing data I think it is a non-sense. There are plenty of packages specifically designed for methylation array analysis : minfi, ChAMPare among the most popular
I have been using ChAMP for such analyses, but as far as I know, ChAMP doesn't give a way for a 'no-replicate' situation. Therefore, I tried to move to DSS. But I understand your point. Thanks!