How to find probe enhancer annotation for Illumina EPIC array?
1
0
Entering edit mode
Matilda • 0
@83c11ba6
Last seen 19 months ago
Sweden

How can I find annotation for enhancers for Illumina's EPIC methylation array?

I am using the ChAMP package and have found the probe features included in the package does not include this information, retrievable by

data(probe.features.epic)

However, I have seen that in the corresponding probe features dataset for 450K data:

data(probe.features)

There are also information about wether the probe is located in an enhancer or not. This is not included in the probe features for the EPIC array.

How can I retreive annotation for enhancers for the EPIC array in the ChAMP package? or from somewhere else?
Thank you!

IlluminaHumanMethylationEPICmanifest ChAMP annotation • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 days ago
United States

Not to be completely pedantic, but enhancers are genomic regions, and don't really have anything to do with the Illumina EPIC array. In other words, I think you want to know which of the CpGs on the EPIC array land within a given enhancer region. There are at least two tables on UCSC's genome browser that might be useful in this context, but one is inaccessible for bulk download (the geneHancerRegElementsDoubleElite table from the Weizmann institute). We can use the ENCODE encodeCcreCombined table however. Note that I just did a cursory look at the available tables, and am no expert in enhancer regions, so there might be a better choice. I am just showing how to get data.

> library(rtracklayer)
> session <- browserSession()
> genome(session) <- "hg38"
> enh <- getTable(ucscTableQuery(session, table = "encodeCcreCombined"))
> class(enh)
[1] "data.frame"
## ugh
## convert to a GRanges object
> enhgr <- with(enh, GRanges(chrom, IRanges(chromStart, chromEnd), ccre = ccre, encodeLabel = encodeLabel, ucscLabel = ucscLabel, description = description))
> enhgr
GRanges object with 926535 ranges and 4 metadata columns:
           seqnames            ranges strand |                 ccre encodeLabel
              <Rle>         <IRanges>  <Rle> |          <character> <character>
       [1]     chr1     181251-181601      * |      pELS,CTCF-bound        pELS
       [2]     chr1     190865-191071      * |      dELS,CTCF-bound        dELS
       [3]     chr1     778562-778912      * |       PLS,CTCF-bound         PLS
       [4]     chr1     779086-779355      * |       PLS,CTCF-bound         PLS
       [5]     chr1     779727-780060      * |      pELS,CTCF-bound        pELS
       ...      ...               ...    ... .                  ...         ...
  [926531]     chrY 56842374-56842545      * |      dELS,CTCF-bound        dELS
  [926532]     chrY 56844431-56844674      * |      dELS,CTCF-bound        dELS
  [926533]     chrY 56857410-56857680      * | CTCF-only,CTCF-bound   CTCF-only
  [926534]     chrY 56857917-56858119      * | CTCF-only,CTCF-bound   CTCF-only
  [926535]     chrY 56868183-56868435      * | CTCF-only,CTCF-bound   CTCF-only
             ucscLabel            description
           <character>            <character>
       [1]        enhP EH38E1310153 proxima..
       [2]        enhD EH38E1310154 distal ..
       [3]        prom EH38E1310158 promote..
       [4]        prom EH38E1310159 promote..
       [5]        enhP EH38E1310160 proxima..
       ...         ...                    ...
  [926531]        enhD EH38E2776491 distal ..
  [926532]        enhD EH38E2776496 distal ..
  [926533]        CTCF EH38E2776512 CTCF-only
  [926534]        CTCF EH38E2776513 CTCF-only
  [926535]        CTCF EH38E2776514 CTCF-only
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

## could subset to just enhancers if you like
> enhonlygr <- subset(enhgr, ucscLabel %in% c("enhP","enhD"))
> enhonlygr
GRanges object with 809429 ranges and 4 metadata columns:
           seqnames            ranges strand |            ccre encodeLabel
              <Rle>         <IRanges>  <Rle> |     <character> <character>
       [1]     chr1     181251-181601      * | pELS,CTCF-bound        pELS
       [2]     chr1     190865-191071      * | dELS,CTCF-bound        dELS
       [3]     chr1     779727-780060      * | pELS,CTCF-bound        pELS
       [4]     chr1     807736-807916      * |            dELS        dELS
       [5]     chr1     812113-812266      * |            dELS        dELS
       ...      ...               ...    ... .             ...         ...
  [809425]     chrY 26447188-26447390      * |            dELS        dELS
  [809426]     chrY 56831604-56831942      * |            dELS        dELS
  [809427]     chrY 56838999-56839348      * | dELS,CTCF-bound        dELS
  [809428]     chrY 56842374-56842545      * | dELS,CTCF-bound        dELS
  [809429]     chrY 56844431-56844674      * | dELS,CTCF-bound        dELS
             ucscLabel            description
           <character>            <character>
       [1]        enhP EH38E1310153 proxima..
       [2]        enhD EH38E1310154 distal ..
       [3]        enhP EH38E1310160 proxima..
       [4]        enhD EH38E1310164 distal ..
       [5]        enhD EH38E1310165 distal ..
       ...         ...                    ...
  [809425]        enhD EH38E2776441 distal ..
  [809426]        enhD EH38E2776463 distal ..
  [809427]        enhD EH38E2776482 distal ..
  [809428]        enhD EH38E2776491 distal ..
  [809429]        enhD EH38E2776496 distal ..
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

I don't really use ChAMP, but I presume whatever object you end up with is either a RangedSummarizedExperiment or inherits from that class. In which case there should be a rowRanges slot that contains a GRanges object. Depending on your objective, you could subset your object by doing

enhancer.champ.object <- subsetByOverlaps(champ.object, enhonlygr)

Or you could just add a boolean to the rowRanges by doing something like

rowRanges(champ.object)$Enhancer <- champ.object %over% enhonlygr
ADD COMMENT
0
Entering edit mode

Or now that I think about it, you could also just inject something from enhonlygr into the rowRanges of your champ.object.

## untested, so you might have to check
fo <- findOverlaps(champ.object, enhonlygr)
rowRanges(champ.object)$Enhancer <- ""
## put the ucscLabel in there
rowRanges(champ.object)$Enhancer[queryHits(fo)] <- enhonlygr$ucscLabel[subjectHits(fo)]

If this is confusing, please read the vignettes for the GenomicRanges, GenomicFeatures, and SummarizedExperiment packages.

ADD REPLY

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6