I have mapped and filtered my Hi-C reads using HiCUP (https://www.bioinformatics.babraham.ac.uk/projects/hicup/) and it is too computationally intensive to align the reads again with diffHiC (each library is ~400 million 150bp pairs). I was wondering if someone could recommend the best way to get these reads into diffHiC. HiCUP provides conversion scripts to HOMER, fitHiC, GOTHiC, hicpipe, and hicPro. I have also generated raw interaction matrices with HOMER so could input these as suggested in a previous post (https://support.bioconductor.org/p/90184/).
In section 2.5 of the user guide the 'savepairs' function is described as a good entry point from other pipelines, but I'm unclear of the format the input data needs to be in. Should it be one data frame arranged as:
anchor1.id anchor1.pos anchor2.len anchor2.id anchor2.pos anchor2.len raw_counts
or in two dataframes - one with the anchor ids and counts, and one with the alignment information?
So to reiterate the question, what is the most efficient way to get aligned Hi-C reads (either as mapped fragments or in bins) into diffHiC from HiCUP (or HOMER), and how should the data be formatted?
Thanks for your help.