need help with overlap overrepresentation using permutation tests for composite genomic ranges
1
0
Entering edit mode
@angel-liang-23540
Last seen 4.4 years ago

Hi, I have a list of composite regions (meaning each region is comprised of one or more non-overlapping/disjoint continuous intervals). Biologically, each number represents the genome-relative coordinates of a single nucleotide position of a transcript spanning one or more exon junctions. The numbers look something like this (for ease of reading, i've bolded the nucleotides at the exon junctions):

121328495,121328496,121328497,121330673,121330674,121330675,121330676,121330677,121330678,121330991"

As you can see, there are gaps/skips in the region.

My goal is to use permutation test for the significance p-value compared to random of the overlap of these composite regions against a set of standard reference genomic ranges. So far, I've been thoroughly enjoying the regioneR package (by the way, it is SO useful. thank God it exists) and the permutation test in regioneR does exactly what I want to do, except it only seems to be able to do permutations of continuous intervals.

I was wondering if there's a way to still use regioneR to solve the problem? I can imagine it would be trivial if there were a way to "lock" the distances between some but not all intervals together - mutually locked intervals would represent composite regions, which could then be run through standard permtest.

However, if this is not possible, then is there any other way to accomplish this goal?

Regards, Angel

regioneR granges iranges • 1.1k views
ADD COMMENT
0
Entering edit mode
bernatgel ▴ 150
@bernatgel-7226
Last seen 28 days ago
Spain

Hi Angel,

Sorry I didn't see this question before. There's currently no way to do what you need with regioneR out-of-the-box, but I think this could be done with custom evaluation functions.

I think a valid approach would be to convert your locked region pairs into a single region, use the standard permutation strategies and create a new evaluation function based on overlapRegions that splits these longer regions into 2 regions, one for its start and one for the end base, and then calls overlapRegions. Does it make sense?

I don't know if you still needed this, but just in case, hope this helps :)

Bernat

ADD COMMENT

Login before adding your answer.

Traffic: 415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6