Question

How to find nearest CpG islands around miRNA genes using granges?

0

Entering edit mode

Adam_M • 0

@adam_m-13707

Last seen 6.0 years ago

Hello,

I'm trying to find the nearest CpG islands around each miRNA of interest(gene) and also return distance between them.
For example for hsa-mir-137 the nearest CpG islands are 73 and 91 where CpG 73 overlaps with this miRNA gene and CpG 91 is ~7500 bp away. What is more is it possible to filter only miRNAs where CpG overlaps it or where distance between them is less than 1000 bp?

Here is what I got:

    hub <- AnnotationHub()
    query(hub, c("cpg","hg19"))
    cpgs <- hub[["AH5086"]] ## the granges of all CpG's (hg19)

    gffRangedData<-import.gff("allmirna.gff3")
    mirna_granges<-as(gffRangedData, "GRanges") ## I downloaded all annotated miRNAs from miRBase and I converted them to granges object

    sample_mirnas <- read.table("sample_data.txt", header = TRUE, sep ='\t') #here are my 3 miRNAs of interest (hsa-mir-106b,hsa-mir-150,hsa-mir-107)
    final_mirnas <- mirna_granges[mirna_granges$Name %in% sample_mirnas$mirna] # now I wanted to have granges only for my miRNA so I subset data from miRBase

So as a result I have:

1) CpG's granges

> cpgs GRanges object with 28691 ranges and 1 metadata column: seqnames ranges strand | name <Rle> <IRanges> <Rle> | <character> [1] chr1 [ 28736, 29810] * | CpG:_116 [2] chr1 [135125, 135563] * | CpG:_30 [3] chr1 [327791, 328229] * | CpG:_29 [4] chr1 [437152, 438164] * | CpG:_84 [5] chr1 [449274, 450544] * | CpG:_99 ... ... ... ... . ... [28687] chr9_gl000201_random [ 15651, 15909] * | CpG:_30 [28688] chr9_gl000201_random [ 26397, 26873] * | CpG:_43 [28689] chr11_gl000202_random [ 16284, 16540] * | CpG:_23 [28690] chr17_gl000204_random [ 54686, 57368] * | CpG:_228 [28691] chr17_gl000205_random [117501, 117801] * | CpG:_23 ------- seqinfo: 93 sequences (1 circular) from hg19 genome

2) Granges of miRNA's of interest

> final_mirnas GRanges object with 3 ranges and 8 metadata columns: seqnames ranges strand | source type score <Rle> <IRanges> <Rle> | <factor> <factor> <numeric> [1] chr10 [ 89592747, 89592827] - | <NA> miRNA_primary_transcript <NA> [2] chr19 [ 49500785, 49500868] - | <NA> miRNA_primary_transcript <NA> [3] chr7 [100093993, 100094074] - | <NA> miRNA_primary_transcript <NA> phase ID Alias Name Derives_from <integer> <character> <CharacterList> <character> <character> [1] <NA> MI0000114 MI0000114 hsa-mir-107 <NA> [2] <NA> MI0000479 MI0000479 hsa-mir-150 <NA> [3] <NA> MI0000734 MI0000734 hsa-mir-106b <NA> ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths

Now, how to combine this data togesher to obtain something like (results are fake, made them up for question):

    miRNA            CpG_upstream   CpG_downstream       CpG_up_distance          CpG_down_distance
    hsa-mir-106b       57                    23                            852                               15
    hsa-mir-150        NA                   77                              NA                               987
    hsa-mir-107        12                    NA                            -12                               NA

I put `NA`If distance between miRNA and CpG is greater than 1000bp but it can be actual value, what is more i put negative number when CpG and miRNA gene overlap.

I tried to use distance() and follow() functions but there are only bare numbers, can you help me how to figure it out?

granges bioconductor R • 1.5k views

ADD COMMENT • link 7.3 years ago Adam_M • 0

0

Entering edit mode

Adam_M • 0

@adam_m-13707

Last seen 6.0 years ago

Julie,

I read the documentation and this function returned what I needed.

Thank you so much, great package!

ADD COMMENT • link 7.3 years ago Adam_M • 0

0

Entering edit mode

Adam, Great! Thanks for letting me know and the kind words! Best regards, Julie On Aug 10, 2017, at 8:43 AM, Adam_M [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Adam_M<https: support.bioconductor.org="" u="" 13707=""/> wrote Answer: How to find nearest CpG islands around miRNA genes using granges?<https: support.bioconductor.org="" p="" 99050="" #99070="">: Julie, I read the documentation and this function returned what I needed. Thank you so much, great package! ________________________________ Post tags: granges, bioconductor, R You may reply via email or visit A: How to find nearest CpG islands around miRNA genes using granges?

ADD REPLY • link 7.3 years ago Julie Zhu ★ 4.3k

score 2 · Accepted Answer · 2017-08-10

Adam, Please try annotatePeakInBatch function in ChIPPeakAnno package. Best regards, Julie On Aug 10, 2017, at 2:13 AM, Adam_M [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Adam_M<https: support.bioconductor.org="" u="" 13707=""/> wrote Question: How to find nearest CpG islands around miRNA genes using granges?<https: support.bioconductor.org="" p="" 99050=""/>: Hello, I'm trying to find the nearest CpG islands around each miRNA of interest(gene) and also return distance between them. For example for hsa-mir-137 the nearest CpG islands are 73 and 91 where CpG 73 overlaps with this miRNA gene and CpG 91 is ~7500 bp away. What is more is it possible to filter only miRNAs where CpG overlaps it or where distance between them is less than 1000 bp? Here is what I got: hub <- AnnotationHub() query(hub, c("cpg","hg19")) cpgs <- hub[["AH5086"]] ## the granges of all CpG's (hg19) gffRangedData<-import.gff("allmirna.gff3") mirna_granges<-as(gffRangedData, "GRanges") ## I downloaded all annotated miRNAs from miRBase and I converted them to granges object sample_mirnas <- read.table("sample_data.txt", header = TRUE, sep ='\t') #here are my 3 miRNAs of interest (hsa-mir-106b,hsa-mir-150,hsa-mir-107) final_mirnas <- mirna_granges[mirna_granges$Name %in% sample_mirnas$mirna] # now I wanted to have granges only for my miRNA so I subset data from miRBase **So as a result I have:** **1) CpG's granges** > cpgs GRanges object with 28691 ranges and 1 metadata column: seqnames ranges strand | name <rle> <iranges> <rle> | <character> [1] chr1 [ 28736, 29810] * | CpG:_116 [2] chr1 [135125, 135563] * | CpG:_30 [3] chr1 [327791, 328229] * | CpG:_29 [4] chr1 [437152, 438164] * | CpG:_84 [5] chr1 [449274, 450544] * | CpG:_99 ... ... ... ... . ... [28687] chr9_gl000201_random [ 15651, 15909] * | CpG:_30 [28688] chr9_gl000201_random [ 26397, 26873] * | CpG:_43 [28689] chr11_gl000202_random [ 16284, 16540] * | CpG:_23 [28690] chr17_gl000204_random [ 54686, 57368] * | CpG:_228 [28691] chr17_gl000205_random [117501, 117801] * | CpG:_23 ------- seqinfo: 93 sequences (1 circular) from hg19 genome **2) Granges of miRNA's of interest** > final_mirnas GRanges object with 3 ranges and 8 metadata columns: seqnames ranges strand | source type score <rle> <iranges> <rle> | <factor> <factor> <numeric> [1] chr10 [ 89592747, 89592827] - | <na> miRNA_primary_transcript <na> [2] chr19 [ 49500785, 49500868] - | <na> miRNA_primary_transcript <na> [3] chr7 [100093993, 100094074] - | <na> miRNA_primary_transcript <na> phase ID Alias Name Derives_from <integer> <character> <characterlist> <character> <character> [1] <na> MI0000114 MI0000114 hsa-mir-107 <na> [2] <na> MI0000479 MI0000479 hsa-mir-150 <na> [3] <na> MI0000734 MI0000734 hsa-mir-106b <na> ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths **Now, how to combine this data togesher to obtain something like (results are fake, made them up for question):** miRNA CpG_upstream CpG_downstream CpG_up_distance CpG_down_distance hsa-mir-106b 57 23 852 15 hsa-mir-150 NA 77 NA 987 hsa-mir-107 12 NA -12 NA **I put `NA`If distance between miRNA and CpG is greater than 1000bp but it can be actual value, what is more i put negative number when CpG and miRNA gene overlap.** **I tried to use distance() and follow() functions but there are only bare numbers, can you help me how to figure it out?** ________________________________ Post tags: granges, bioconductor, R You may reply via email or visit How to find nearest CpG islands around miRNA genes using granges?