Using Granges to find overlapping pairs at exactly 10kb distance
0
0
Entering edit mode
gshweta95 • 0
@246a9bf4
Last seen 8 months ago
Germany

Dear All, I am trying to extract overlapping pairs from two Granges objects. Here, I extract all the pairs which overlap either with the start or with the end of the gene. Then i extract the names of the overlapping range from their respective dataframe by merging the start and end positions.


overlaped_pairs <- findOverlapPairs(gr3, gr1, type="equal")

# find overlapping range names 2
names2 <- merge(overlaped_pairs@second, df2, by = c("start","end"))

# find overlapping range names 1
names1<- merge(overlaped_pairs@first, df1, by = c("start","end"))

However, what I would like to have is to find overlaps that are exactly at 10kb distance, so I ran the following code :


overlaped_pairs10kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 10000)

# find overlapping range names 2
names2_10kb <- merge(overlaped_pairs10kb@second, df2, by = c("start","end"))

# find overlapping range names 1
names1_10kb <- merge(overlaped_pairs10kb@first, df1, by = c("start","end"))

But this results in pairs that have overlap with either the start or end of the gene upto 10000 bps. However, I want them to have exact 10,000 bp distance.

So the questions are as follows :

  1. Is it right to consider maxgap in basepairs?
  2. Is there a way to find pairs at exact distance?
  3. Are there any other packages that could help me with this?

Another idea would be to use

# calculate larger maxgap
overlaped_pairs50kb <- findOverlapPairs(gr3, gr1, type="equal", maxgap = 50000)

 #Calculate the distances between the start and end positions
second <- overlaped_pairs50kb@second@ranges
second_df <- data.frame(second)

first <- overlaped_pairs50kb@first@unlistData@ranges
first_df <- data.frame(first)


 #Calculate the distances between the start and end positions
start_distance <- abs(first_df$start -   second_df$start)
end_distance <- abs(first_df$end -   second_df$end)

# Check if the start or end distance is exactly 10000
exact_distance_pairs_first <- first_df[(start_distance == 10000 | end_distance == 10000),]
exact_distance_pairs_second <- second_df[(start_distance == 10000 | end_distance == 10000),]

I was wondering if there is a better solution for this, and I would really appreciate some help!

Thanks and best, Shweta

genomeIntervals GenomicRanges • 389 views
ADD COMMENT

Login before adding your answer.

Traffic: 520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6