I am aware that non-reproducible questions are annoying. However, I am not sure how to reproduce my problem without my original data (and consequently to large to be included here).
I have two groups of genomic ranges, 'Nre' and 'Re', and I compared separately how random are their overlap with CpG sites in genome with regioneR package.
library(regioneR) ptNre <- overlapPermTest(A=Nre, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE) ptRe <- overlapPermTest(A=Re, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE)
Then I checked in a loop of simulations what I could expect by random using the same function as in
overlapPermTest (randomizeRegions):
library(foreach) library(doMC) RanNreNumOv <- GRangesList() RanReNumOv <- GRangesList() RanNreNumOv <- foreach(i=1:100) %dopar% { length(subsetByOverlaps(FEATURE, randomizeRegions(Nre, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))} RanReNumOv <- foreach(i=1:100) %dopar% { length(subsetByOverlaps(FEATURE, randomizeRegions(Re, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))} > ptNre [[1]] P-value: 0.0008999100089991 Z-score: -3.0158 Number of iterations: 10000 Alternative: less Evaluation of the original region set: 44678 Evaluation function: numOverlaps Randomization function: randomizeRegions > mean(unlist(RanNreNumOv)) [1] 43016.93 > ptRe [[1]] P-value: 9.99900009999e-05 Z-score: 9.9826 Number of iterations: 10000 Alternative: greater Evaluation of the original region set: 11950 Evaluation function: numOverlaps Randomization function: randomizeRegions > mean(unlist(RanReNumOv)) [1] 7151.644
Both sets of genomic ranges displayed higher number of overlaps than the average of those that I simulated by chance. However, in the 'Nre' set the alternative was less and in the 'Re' was greater in the overlapPermTest.
Am I missing something? I would be grateful for any help to interpret the results here.