diffHic: alternative input?
2
0
Entering edit mode
@chiaraanicoletti-11173
Last seen 8.4 years ago

Dear Aaron,

I wanted to use diffHic just to perform peak calling on Hi-C data. Do I necessarily have to start from fastq alignment  with the script included in the package or I can use an alternative input (e.g. HiC-Pro filtered read pairs)? If so, which function can I use and how have I to format the input file(s)? Thanks in advance.

Best,

Chiara

diffhic • 1.3k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 25 minutes ago
The city by the bay

There are several points at which you can enter the diffHic analysis:

  • From the FASTQ files, as you've mentioned. This involves aligning the files to the reference genome with presplit_map.py, and then proceeding as described in the user's guide.
  • From name-sorted paired-end BAM files. These can be directly used in preparePairs.
  • From restriction fragment assignments. If you have a data frame where each row corresponds to a read pair and specifies the indices of the restriction fragments to which each read in the pair was mapped (and you also have a GRanges object containing the coordinates of those restriction fragments), you can use savePairs to save this information into a HDF5 object for further analysis.
  • From interaction matrices containing read pair counts. These can be converted into InteractionSet objects for hypothesis testing with edgeR, see A: Loading a precomputed interaction matrix into diffHiC.

Between these options, you should find something that is more-or-less satisfactory.

ADD COMMENT
0
Entering edit mode
@chiaraanicoletti-11173
Last seen 8.4 years ago

Thank you very much! Is there any chance to obtain an InteractionSet from an interaction matrix (containing read pair counts from merged replicates) to be used as input for the enrichedPairs function? Because I tried to convert it to an InteractionSet object but when I run enrichedPairs I get the following warning:

Warning messages:

1: In .local(object, ...) :

  library sizes not found in 'totals', setting to NULL

2: In mean.default(data$totals) :

  argument is not numeric or logical: returning NA

Thanks again!

ADD COMMENT
0
Entering edit mode

You need to set the total number of read pair counts in each sample. This would automatically be done for you if you used the output from squareCounts, but you'll have to do it yourself if you make your own InteractionSet. Something like:

iset$totals <- colSums(assay(iset))

... should be sufficient, provided that every interaction with non-zero counts is present in iset (and you didn't count read pairs into more than one interaction). Or, if you already know your sequencing depth from some other source, you can set it to that.

ADD REPLY

Login before adding your answer.

Traffic: 672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6