Hello,
I have used the excellent package DMRcate and I am trying to extract the CpG names included in each DMR from my results.ranges. I have found this question many times over online forums but I think they are outdated as when I do this code that the creators of the package recommended:
locs <- IlluminaHumanMethylationEPICanno.ilm10b2.hg19::Locations
locs.ranges <- GRanges(locs$chr, IRanges(locs$pos, locs$pos))
names(locs.ranges) <- rownames(locs)
DMR.IDs <- lapply(results.ranges, function (x) names(locs.ranges[locs.ranges %over% x]))
I get this error on the last line of code: Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment
Found on other forums that a "for loop" can be used on granges objects instead of lapply. But I don't know how to convert this lapply function to a for loop in this context for it to work.
Can anyone help with converting this last line of code to a for loop, or anything else that would extract the CpG names from a DMR?
Thank you, Shanie
Thanks so much Tim for the quick reply. However, I am getting "NULL" when I try that code.
I don't know whether this is the issue, but I am getting NULL for results.ranges@ranges@NAMES. This seems wrong and I cant figure out why cpg names aren't appearing there as they do in the annotation@ranges@NAMES.
Thanks again, Shanie
Hi Shanie,
Did you get NULL from the vignette example, or your own example? If the latter could you please post your workflow with sessionInfo()? If the former just post the sessionInfo() and I'll figure out what is going on.
DMR ranges won't have CpG names because, well, they're DMRs, not CpGs. So you won't find the CpG IDs there.
Cheers, Tim
Hi Tim,
I get NULL for my own data. I have 92 DMRs and in my results.ranges I can see the chrom, start, no. cpgs etc..
Here is my work flow:
And session info:
R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 [4] LCNUMERIC=C LCTIME=English_United States.1252
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] DMRcatedata2.6.0 ExperimentHub1.14.0 AnnotationHub2.20.0 BiocFileCache1.12.0
[5] dbplyr1.4.3 GenoGAM2.6.0 data.table1.12.8 Matrix1.2-18
[9] HDF5Array1.16.0 rhdf52.32.0 DMRcate2.2.0 minfi1.34.0
[13] bumphunter1.30.0 locfit1.5-9.4 iterators1.0.12 foreach1.5.0
[17] Biostrings2.56.0 XVector0.28.0 SummarizedExperiment1.18.1 DelayedArray0.14.0
[21] matrixStats0.56.0 Biobase2.48.0 GenomicRanges1.40.0 GenomeInfoDb1.24.0
[25] IRanges2.22.1 S4Vectors0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached): [1] R.utils2.9.2 tidyselect1.1.0
[3] IlluminaHumanMethylationEPICanno.ilm10b2.hg190.6.0 RSQLite2.2.0
[5] AnnotationDbi1.50.0 htmlwidgets1.5.1
[7] grid4.0.0 BiocParallel1.22.0
[9] munsell0.5.0 codetools0.2-16
[11] preprocessCore1.50.0 statmod1.4.34
[13] colorspace1.4-1 knitr1.28
[15] rstudioapi0.11 GenomeInfoDbData1.2.3
[17] bit640.9-7 vctrs0.3.0
[19] lambda.r1.2.4 xfun0.14
[21] biovizBase1.36.0 R62.4.1
[23] illuminaio0.30.0 AnnotationFilter1.12.0
[25] bitops1.0-6 reshape0.8.8
[27] assertthat0.2.1 IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [29] promises1.1.0 scales1.1.1
[31] bsseq1.24.0 nnet7.3-14
[33] gtable0.3.0 ensembldb2.12.1
[35] rlang0.4.6 genefilter1.70.0
[37] splines4.0.0 rtracklayer1.48.0
[39] lazyeval0.2.2 acepack1.4.1
[41] DSS2.36.0 GEOquery2.56.0
[43] dichromat2.0-0 checkmate2.0.0
[45] BiocManager1.30.10 yaml2.2.1
[47] GenomicFeatures1.40.0 backports1.1.6
[49] httpuv1.5.2 Hmisc4.4-0
[51] tools4.0.0 nor1mix1.3-0
[53] ggplot23.3.0 ellipsis0.3.1
[55] RColorBrewer1.1-2 siggenes1.62.0
[57] Rcpp1.0.4.6 plyr1.8.6
[59] base64enc0.1-3 progress1.2.2
[61] zlibbioc1.34.0 purrr0.3.4
[63] RCurl1.98-1.2 prettyunits1.1.1
[65] rpart4.1-15 openssl1.4.1
[67] cluster2.1.0 magrittr1.5
[69] futile.options1.0.1 IlluminaHumanMethylationEPICanno.ilm10b4.hg190.6.0 [71] ProtGenerics1.20.0 missMethyl1.22.0
[73] hms0.5.3 mime0.9
[75] xtable1.8-4 XML3.99-0.3
[77] jpeg0.1-8.1 readxl1.3.1
[79] mclust5.4.6 gridExtra2.3
I have until row 147 but it wont let me post anymore. Really appreciate your help :)
Hi Shanie,
I can see 2 problems here:
1) Why are you constructing your own S4
CpGannotated
object? Thecpg.annotate()
function specifically exists for that. Do you have aGenomicRatioSet
or M-value matrix that houses your methylation data?2) You are also constructing your own custom CpG ranges object
locs.ranges
to pass tosubsetByOverlaps
which is aGRanges
object, not aGenomicRatioSet
object. The latter has rownames but the former does not; evidenced by your linenames(locs.ranges) <- rownames(locs)
. So that's why you are getting NULL. You might get the result if you need if you callnames(subsetByOverlaps(...
instead ofrownames(subsetByOverlaps(...
, but first I would go right back to your normalised data and runcpg.annotate()
first.Cheers, Tim
Thanks so much! I got it :) Some happy dances and squeels were had.
We use the annotation from Zhou et al, and we have added genehancer and chromatin states information to it.
That was helpful very helpful. I had to use the annotated Granges object before I turned it into a CpGannotated object. Then indeed it worked to use names(... instead of rownames(...
Thanks so much for the tips in the right direction! All the best, Shanie