Question

MEDIPS vs MACS: why does MEDIPS generate more conservative p-values

0

Entering edit mode

tptacek3050 • 0

@tptacek3050-7477

Last seen 9.8 years ago

United States

I'm working on getting MEDIPS running for some collaborators who previously had been using MACS to analyze their methyl cap seq data.

I've run MEDIPS on a data set that they had previously run through MACS. I only got one solid hit (p<0.05 after correction for multiple comparisons) using MEDIPS, while MACS yielded ~200. The single positive hit from MEDIPS was replicated in MACS, and the MACS data were corrected for multiple comparisons using the same method (BH).

After looking at the data, it appears that the issue is that MEDIPS is wiping out all of the p-values when adjusting for multiple comparisons. I spot-checked some of the hits from MACS, and for every hit in MACS, I see an interval or two that have a significant (<0.05) p-value in MEDIPS that gets adjusted to 1 or so. I've tried a few methods to "fix" this: I've tried increasing the window size (idea: larger window = less intervals = less comparisons) and changing the correction method (tried fdr, as it isn't overly conservative in my experience). None of this changed the outcome (i.e. single hit), although the exact corrected p-value did fluctuate.

Can anyone with a better understanding of the MEDIPS and MACS algorithms explain these differences?

p-value medips MACS • 2.5k views

ADD COMMENT • link updated 9.8 years ago by Lukas Chavez ▴ 570 • written 9.8 years ago by tptacek3050 • 0

score 1 · Answer 1 · 2015-06-09

Dear tptacek3050, thank you for your detailed comparison of MACS and MEDIPS! MEDIPS was not designed to identify enriched DNA IP-seq enriched genomic regions over Input (“peaks”), because there exist great tools to accomplish this (like e.g. MACS). Instead, MEDIPS aims to identify differentially DNA IP-seq enriched genomic regions comparing two different conditions by considering technical and biological variation across replicates (enabled by edgeR). In case of methylation specific DNA-IP seq assays, like MeDIP-seq, MEDIPS is capable of transforming DNA-IP seq data into methylation values by normalising for local CpG densities. However, differential DNA IP-seq coverage (or differential methylation in case of methylation specific assays) between conditions will be calculated based on the actual read counts without any CpG density normalisation (please also consider section 6.7 (Comments on the experimental design and Input data) of the latest MEDIPS vignette. However, what you are encountering is a problem of multiple testing due to the vast amount of small genomic windows tested by MEDIPS for differential coverage. This is especially problematic when the number of replicates per group is small. To approach this problem you could increase the minRowSum parameter of the MEDIPS.meth() function (default=10 in MEDIPS v. 1.18.0) what will remove a vast amount genomic regions with no or low coverage what will reduce the total number of applied tests. In fact, we are currently working on an optimised way to exclude genomic windows with low coverage before testing differential coverage between conditions. Alternatively, you could only test predefined regions of interest by using MEDIPS.createROIset() instead of MEDIPS.createSet(). Using MEDIPS.createROIset() you can specify any set of genomic regions (e.g. peaks, CGIs, promoters etc.) at the parameter ROI. By the way, the parameter bn enables binning of the ROIs into smaller bins, if desired. Please see also the example in ?MEDIPS.createROIset. All the best, Lukas On 09 Jun 2015, at 17:08, tptacek3050 [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User tptacek3050<https: support.bioconductor.org="" u="" 7477=""/> wrote Question: MEDIPS vs MACS: why does MEDIPS generate more conservative p-values<https: support.bioconductor.org="" p="" 68549=""/>: I'm working on getting MEDIPS running for some collaborators who previously had been using MACS to analyze their methyl cap seq data. I've run MEDIPS on a data set that they had previously run through MACS. I only got one solid hit (p<0.05 after correction for multiple comparisons) using MEDIPS, while MACS yielded ~200. The single positive hit from MEDIPS was replicated in MACS, and the MACS data were corrected for multiple comparisons using the same method (BH). After looking at the data, it appears that the issue is that MEDIPS is wiping out all of the p-values when adjusting for multiple comparisons. I spot-checked some of the hits from MACS, and for every hit in MACS, I see an interval or two that have a significant (<0.05) p-value in MEDIPS that gets adjusted to 1 or so. I've tried a few methods to "fix" this: I've tried increasing the window size (idea: larger window = less intervals = less comparisons) and changing the correction method (tried fdr, as it isn't overly conservative in my experience). None of this changed the outcome (i.e. single hit), although the exact corrected p-value did fluctuate. Can anyone with a better understanding of the MEDIPS and MACS algorithms explain these differences? ________________________________ You may reply via email or visit MEDIPS vs MACS: why does MEDIPS generate more conservative p-values Dr. Lukas Chavez Division of Pediatric Neurooncology Group Leader Computational Oncoepigenomics German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 280 69120 Heidelberg Germany phone: +49 6221 42-4676 mobile: +49 172 158 8231 l.chavez@dkfz.de<mailto:l.chavez@dkfz.de> www.dkfz.de<http: www.dkfz.de=""><http: www.dkfz.de=""/> https://www.dkfz.de/en/paediatrische-neuroonkologie/index.php http://pediatric-neurooncology.dkfz.de/index.php/en/research/pediatric-neurooncology/bioinformatics [cid:A23DCB56-6826-4A7D-AB02-51FE00A96626@dkfz-heidelberg.de] Management Board: Prof. Dr. Dr. h.c. Otmar D. Wiestler, Prof. Dr. Josef Puchta VAT-ID No.: DE143293537 Confidentiality Note: This message is intended only for the use of the named recipient(s) and may obtain confidential and/or privileged information. If you are not the intended recipient, please contact the sender and delete the message. Any unauthorized use of the information contained in this message is prohibited.