extract HiC contacts for a given BEDPE set of coordinates
1
0
Entering edit mode
gryderart • 0
@gryderart-17061
Last seen 6.4 years ago
Bethesda, MD

I was wondering if you could answer a technical question about computational methods.  I have generated contact maps (.hic format), but I can’t seem to figure out a nice way to extract the contacts for a list of paired region sets.  Example input in BEDPE format:

 

chr5      1500000             1505000             chr5      1520000             1525000

chr11    1701000             1706000             chr5      1730000             1735000

 

 

with the output being something like:

chr5      1500000             1505000             chr5      1520000             1525000              105

chr11    1701000             1706000             chr5      1730000             1735000              78

 

where the new final column is the number of valid contact pairs between the two regions. 

 

Are there tools for doing this that I just don’t know about?  The juicer tool “dump” only works one coordinate set at a time, for large regions typically, and gets a whole matrix.

 

Thanks!

Berkley

 

 

Berkley Gryder, Ph.D.

Genetics Branch

CCR, NCI, NIH

Building 37, Room 2016

37 Convent Dr.

Bethesda, MD 20892

240-760-6898 (lab), 864-906-2506 (cell)

@GryderArt

 

hic diffhic chromatin bedpe • 1.7k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

It's always easier to demonstrate when there's a mock-up data set ready to go:

writeLines(c(
    rep(paste(c("chr5", "1500000", "1505000", "chr5", "1520000", "1525000"), collapse="\t"), 105),
    rep(paste(c("chr11", "1701000", "1706000", "chr5", "1730000", "1735000"), collapse="\t"), 74)
    ), con="example.bedpe"
)

In theory, you should be able to use import from rtracklayer to load this in, but it seems that import for BEDPE files fails without strand information. So let's just construct our GInteractions object manually.

pairings <- read.delim("example.bedpe", header=FALSE)
anchor1 <- GRanges(pairings[[1]], IRanges(pairings[[2]], pairings[[3]]))
anchor2 <- GRanges(pairings[[4]], IRanges(pairings[[5]], pairings[[6]]))

library(InteractionSet)
gi <- GInteractions(anchor1, anchor2, mode="strict")

The strict mode is important as it ensures that interactions are not considered to be different when the order of anchor regions differs. However, if the order of anchor regions has some meaning, then you should stick with mode="normal".

After that, you can use a variety of strategies to get the number of times each interaction occurs. The easiest is probably:

unique.gi <- unique(gi)
uid <- match(gi, unique.gi)
n <- tabulate(uid)

... where n is the number of times each entry of unique.gi is present in gi. Then you can just do:

unique.gi$Count <- n
to.save <- as.data.frame(unique.gi)

... and use write.table to save whatever you want to a new file. You could also export it with rtracklayer:

export(con="output.bedpe", as(unique.gi, "Pairs"))

... where the last field is the count.

ADD COMMENT

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6