The "part of your pipeline that sets the contrasts and performs the statistical tests" is basically just edgeR. Most of the work that diffHic does is to process the data to get the read pair counts for each bin pair in each sample. So, if you've already got a matrix of counts (where each bin pair is a row, and each column is a sample - see below), you can just go straight to testing for differences between conditions with edgeR.
However, from your original post, it seems that you have normalized intensities rather than counts. If you want to use edgeR (or DESeq2, or limma + voom
), you need the raw counts. Information about normalization is introduced via GLM offsets, rather than being applied directly to the counts. To this end, you could use an approach similar to the one described in Section 5.3 of the user's guide, where the log-transformed correction factor (required to get from the count to the normalized intensity in the first place) is used as the offset.
With all that said, though, is ICE really necessary for detecting differences? Remember that you're comparing the same bin pair between samples, such that the genomic biases that are removed by ICE would largely cancel out anyway. More important for a differential analysis are the biases between samples, e.g., due to variable cross-linking/ligation efficiency. I would suggest working from the raw counts and using the normOffsets
function to compute offsets that eliminate library-specific biases. Alternatively, use normalizeCNV
if your genomic biases are different between samples (e.g., due to copy number variations).
Finally, I guess that, even if you did have counts, you would have a 2D interaction count matrix from each sample (where each row and column represents a genomic region) rather than the matrix of counts per bin pair per sample that edgeR expects. To get from one to another, you can easily convert each of your 2D count matrices to ContactMatrix
objects, and then do something like the below to obtain a single matrix for all bin pairs/samples. Assuming you have two libraries (represented by cm1
and cm2
):
# Select interesting interactions; in this case, non-zero in at least one library
to.keep <- as.matrix(cm1)!=0 | as.matrix(cm2)!=0
iset1 <- deflate(cm1, extract=to.keep)
iset2 <- deflate(cm2, extract=to.keep)
data <- cbind(iset1, iset2)
interactions(data) <- as(interactions(data), "ReverseStrictGInteractions")
The data
object can then be run through normOffsets
and asDGEList
- read the diffHic user's guide for more details.