Question

Conflicting results in 'overlapPermTest' function from regioneR package

0

Entering edit mode

Vinicius Henrique da Silva ▴ 40

@vinicius-henrique-da-silva-6713

Last seen 23 months ago

Brazil

I am aware that non-reproducible questions are annoying. However, I am not sure how to reproduce my problem without my original data (and consequently to large to be included here).

I have two groups of genomic ranges, 'Nre' and 'Re', and I compared separately how random are their overlap with CpG sites in genome with regioneR package.

library(regioneR)
ptNre <- overlapPermTest(A=Nre, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE)

ptRe <- overlapPermTest(A=Re, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE)

Then I checked in a loop of simulations what I could expect by random using the same function as in

overlapPermTest (randomizeRegions):

library(foreach)
library(doMC)

RanNreNumOv <- GRangesList()
RanReNumOv <- GRangesList()

RanNreNumOv <- foreach(i=1:100) %dopar% {
length(subsetByOverlaps(FEATURE, randomizeRegions(Nre, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))}

RanReNumOv <- foreach(i=1:100) %dopar% {
length(subsetByOverlaps(FEATURE, randomizeRegions(Re, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))}

> ptNre
[[1]]
P-value: 0.0008999100089991
Z-score: -3.0158
Number of iterations: 10000
Alternative: less
Evaluation of the original region set: 44678
Evaluation function: numOverlaps
Randomization function: randomizeRegions

> mean(unlist(RanNreNumOv))
[1] 43016.93
> ptRe
[[1]]
P-value: 9.99900009999e-05
Z-score: 9.9826
Number of iterations: 10000
Alternative: greater
Evaluation of the original region set: 11950
Evaluation function: numOverlaps
Randomization function: randomizeRegions

> mean(unlist(RanReNumOv))
[1] 7151.644

Both sets of genomic ranges displayed higher number of overlaps than the average of those that I simulated by chance. However, in the 'Nre' set the alternative was less and in the 'Re' was greater in the overlapPermTest.

Am I missing something? I would be grateful for any help to interpret the results here.

regioneR • 1.4k views

ADD COMMENT • link updated 8.3 years ago by bernatgel ▴ 150 • written 8.3 years ago by Vinicius Henrique da Silva ▴ 40

score 2 · Accepted Answer · 2017-01-09

Hi Vinicius,

The problem may be due to how the overlaps are counted. When a region in A overlaps multiple regions in B it can be counted as one or multiple overlaps. By default, the function numOverlaps in regioneR counts them as multiple overlaps while your code counts the overlap only once. Actually, the function numOverlaps, which is internally used by overlapPermTest, has an additional parameter called count.once than can change this behaviour and you can set it to TRUE in the call to overlapPermTest:

ptNre <- overlapPermTest(A=Nre, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE, count.once=TRUE)

In addition, the PermTest object contains a vector with the evaluation of the randomized region sets and it is accessible as pt$permuted. Therefore, you can see if the mean of the randomized evaluations differs from your simulation with:

mean(ptNre$permuted)

mean(ptRe$permuted)

In any case, if this does not help we would be happy to take a closer look if you can send us the three region sets.

Hope this helps