Question

How does the function MEDIPS.CpGenrich work?

0

Entering edit mode

stb • 0

@stb-11175

Last seen 7.8 years ago

Hi,

I have a question regarding the MEDIPS.CpGenrich function.

I have done a MeDIP-seq experiment. I go many more reads back from sequencing than expected. As an example, after filtering in Galaxy a total of 58,712,585 first mate reads are imported to MEDIPS for one of the samples.

With this deep level of sequencing I have reads aligned to almost the entire genome (since MeDIP only enrich for the methylated fraction of the genome). I can, however, see that the number of reads across eg. CpG Islands drops as expected.

My question is, how the MEDIPS.CpGenrich function counts the C's, G's and CpG's. From the supplementary Methods for "Computational analysis of genome-wide DNA-methylation during the differentiation of human embryonic stem cells along the endodermal lineage", Chavez et al., Genome Research 2010, I read that:

"the CpG enrichment approach examines how strong the genomic regions underlying the obtained short reads are enriched for CpGs compared to the frequency of CpGs present in the refernce genome".

... As I understand this, it means that the CpG's in the genomic region underlying the obtained short reads are counted, and the number of reads covering a given region is not taken into account!? So, if I have reads covering almost the entire genome, I will get a low or no enrichment? Or will regions with many reads weight more in the calculation?

Thanks

Stine

medips • 1.3k views

ADD COMMENT • link updated 8.0 years ago by Lukas Chavez ▴ 570 • written 8.0 years ago by stb • 0

score 0 · Answer 1 · 2017-04-27

Dear Stine, it should always be the enrichment of CpGs within the sequencing data compared to the total number of sequenced DNA in a MeDIP experiment. This is compared to the relative abundance of CpGs in the reference genome and results in the enrichment score. Sincerely, Lukas On 27. Apr 2017, at 15:20, stb [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User stb<https: support.bioconductor.org="" u="" 11175=""/> wrote Question: How does the function MEDIPS.CpGenrich work?<https: support.bioconductor.org="" p="" 95334=""/>: Hi, I have a question regarding the MEDIPS.CpGenrich function. I have done a MeDIP-seq experiment. I go many more reads back from sequencing than expected. As an example, after filtering in Galaxy a total of 58,712,585 first mate reads are imported to MEDIPS for one of the samples. With this deep level of sequencing I have reads aligned to almost the entire genome (since MeDIP only enrich for the methylated fraction of the genome). I can, however, see that the number of reads across eg. CpG Islands drops as expected. My question is, how the MEDIPS.CpGenrich function counts the C's, G's and CpG's. From the supplementary Methods for "Computational analysis of genome-wide DNA-methylation during the differentiation of human embryonic stem cells along the endodermal lineage", Chavez et al., Genome Research 2010, I read that: "the CpG enrichment approach examines how strong the genomic regions underlying the obtained short reads are enriched for CpGs compared to the frequency of CpGs present in the refernce genome". ... As I understand this, it means that the CpG's in the genomic region underlying the obtained short reads are counted, and the number of reads covering a given region is not taken into account!? So, if I have reads covering almost the entire genome, I will get a low or no enrichment? Or will regions with many reads weight more in the calculation? Thanks Stine ________________________________ Post tags: medips You may reply via email or visit How does the function MEDIPS.CpGenrich work?