Conflicting results in 'overlapPermTest' function from regioneR package
1
0
Entering edit mode
@vinicius-henrique-da-silva-6713
Last seen 18 months ago
Brazil

I am aware that non-reproducible questions are annoying. However, I am not sure how to reproduce my problem without my original data (and consequently to large to be included here). 

I have two groups of genomic ranges, 'Nre' and 'Re', and I compared separately how random are their overlap with CpG sites in genome with regioneR package. 

library(regioneR)
ptNre <- overlapPermTest(A=Nre, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE)

ptRe <- overlapPermTest(A=Re, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE)

Then I checked in a loop of simulations what I could expect by random using the same function as in

overlapPermTest (randomizeRegions):

library(foreach)
library(doMC)

RanNreNumOv <- GRangesList()
RanReNumOv <- GRangesList()

RanNreNumOv <- foreach(i=1:100) %dopar% {
length(subsetByOverlaps(FEATURE, randomizeRegions(Nre, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))}

RanReNumOv <- foreach(i=1:100) %dopar% {
length(subsetByOverlaps(FEATURE, randomizeRegions(Re, genome=genome, non.overlapping=TRUE), ignore.strand=TRUE))}

> ptNre
[[1]]
P-value: 0.0008999100089991
Z-score: -3.0158
Number of iterations: 10000
Alternative: less
Evaluation of the original region set: 44678
Evaluation function: numOverlaps
Randomization function: randomizeRegions

> mean(unlist(RanNreNumOv))
[1] 43016.93
> ptRe
[[1]]
P-value: 9.99900009999e-05
Z-score: 9.9826
Number of iterations: 10000
Alternative: greater
Evaluation of the original region set: 11950
Evaluation function: numOverlaps
Randomization function: randomizeRegions

> mean(unlist(RanReNumOv))
[1] 7151.644

Both sets of genomic ranges displayed higher number of overlaps than the average of those that I simulated by chance. However, in the 'Nre' set the alternative was less and in the 'Re' was greater in the overlapPermTest.

Am I missing something? I would be grateful for any help to interpret the results here. 

 

regioneR • 1.3k views
ADD COMMENT
2
Entering edit mode
bernatgel ▴ 150
@bernatgel-7226
Last seen 13 hours ago
Spain

Hi Vinicius,

The problem may be due to how the overlaps are counted. When a region in A overlaps multiple regions in B it can be counted as one or multiple overlaps. By default, the function numOverlaps in regioneR counts them as multiple overlaps while your code counts the overlap only once. Actually, the function numOverlaps, which is internally used by overlapPermTest, has an additional parameter called count.once than can change this behaviour and you can set it to TRUE in the call to overlapPermTest:

ptNre <- overlapPermTest(A=Nre, B=CpG, ntimes=100, mc.cores=8, genome=genome, force.parallel=TRUE, mc.set.seed=FALSE, non.overlapping=FALSE, count.once=TRUE)

In addition, the PermTest object contains a vector with the evaluation of the randomized region sets and it is accessible as pt$permuted. Therefore, you can see if the mean of the randomized evaluations differs from your simulation with:

mean(ptNre$permuted)
mean(ptRe$permuted)

In any case, if this does not help we would be happy to take a closer look if you can send us the three region sets.

Hope this helps

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6