Hi all,
I have two related questions regarding bumphunter
1) To get a reasonable number of candidate bumps, I need to use pickcutoffQ =.999, and even then the output would be ~ 18,000 candidate bumps. If I set the quantile to 95% or even 99%, it will result in at least 80 000 bumps (I did a quick check and ran it with B=1). The minfi tutorial (https://www.bioconductor.org/help/course-materials/2015/BioC2015/methylation450k.html) recommends no more than 30,000 candidate bumps. Is there any reason to be concerned about my data ? Code:
dmrs <- bumphunter(GRSet, design = designMatrix, pickCutoff=T, pickCutoffQ=.999,B=1000, type="M", nullMethod='bootstrap')
My designmatrix includes SVs as covariates and looks like this:
(Intercept) pheno.treat$CohortFollowup sv.treat$V1 sv.treat$V2 sv.treat$V3 sv.treat$V4 sv.treat$V5 sv.treat$V6
1 1 0 -0.061862297 -0.043804370 0.155861131 -0.1339645911 0.122396963 -0.071573199
2 1 0 0.126700203 -0.081453396 0.150190568 -0.1488971458 0.019325485 0.088171420
3 1 0 -0.042331161 -0.086767917 0.109915301 0.0101167599 0.060133643 0.127854026
4 1 0 0.266824697 -0.295717842 -0.006385234 -0.0003092789 -0.168059079 0.013780865
5 1 0 -0.058048360 -0.060320928 0.028131679 -0.1240023344 0.064810238 0.106945146
6 1 0 0.093184655 -0.255128273 0.003195866 -0.0219380222 -0.011526742 0.007678827
Truncated for simplicity
2) Using the code shown in question 1, the resulting list of candidate bumps includes a high number of regions consisting of only 1 probe, like shown below.
head(dmrs.b1000.cutoff.999.sva.treat$table, 20)
chr start end value area cluster indexStart indexEnd L clusterL p.value fwer p.valueArea fwerArea
12297 chr12 133000178 133000178 -4.353422 4.353422 98843 527238 527238 1 8 0.000000e+00 0.000 0.0025740044 0.770
1126 chr10 15210264 15210264 4.345061 4.345061 40704 409605 409605 1 14 0.000000e+00 0.000 0.0025935786 0.773
10539 chr1 78444904 78444904 -3.589283 3.589283 16759 34556 34556 1 16 2.796311e-06 0.002 0.0043790232 0.889
13678 chr17 45266772 45266772 -3.314790 3.314790 161285 656273 656273 1 17 2.796311e-06 0.002 0.0053689173 0.930
14037 chr19 797342 797342 -3.311399 3.311399 178278 689503 689503 1 12 2.796311e-06 0.002 0.0053856952 0.931
15910 chr3 156392701 156392703 -1.995375 3.990751 268542 169109 169110 2 29 1.537971e-05 0.011 0.0032772766 0.824
11050 chr10 17659399 17659399 -2.681196 2.681196 40985 410113 410113 1 8 4.474098e-05 0.032 0.0090111125 0.976
14236 chr19 19779476 19779476 -2.676268 2.676268 184718 705014 705014 1 12 4.613913e-05 0.033 0.0090572517 0.976
16173 chr4 48485301 48485301 -2.666049 2.666049 280261 191412 191412 1 12 4.753729e-05 0.034 0.0091481318 0.976
13545 chr17 18965556 18965556 -2.451483 2.451483 156147 645050 645050 1 2 1.090561e-04 0.072 0.0111209293 0.985
14464 chr2 10588646 10588646 -2.437339 2.437339 194619 79031 79031 1 19 1.202414e-04 0.078 0.0112481614 0.985
11657 chr11 64684723 64684723 -2.244582 2.244582 67553 463915 463915 1 13 2.348901e-04 0.140 0.0135495255 0.993
6431 chr20 62367632 62367893 0.989776 6.928432 235817 744686 744692 7 27 2.334920e-04 0.153 0.0007857634 0.413
14307 chr19 41119278 41119278 -2.222419 2.222419 187608 711367 711367 1 10 2.642514e-04 0.154 0.0138473326 0.994
My previous experience with bumphunter have never resulted in this many 1-probe-regions. Does anyone know why this happens? And how can I change it so that my list will consist of regions with multiple CpGs? The data set contains ~790,000 probes after normalisation and QC.
Any help is highly appreciated!
Thanks, Ina
Thank you so much for answering. My data has been well explored and checked beforehand, and we do expect quite big differences between the groups. And reading your answer now, I understand how that would result in lots of 1 probe-regions - I just wasn't able to see that before.
Thanks for making it clear !
Ina