Entering edit mode
Hello,
I am running HiCDCPlus on valid pairs obtained from HiCPro. I have created an R script to do this and then I am running the script from the commandline on my university's HPC. When I do this, 3 of my 6 samples fail with the follow error:
Loading required package: rtracklayer
Chromosome chr1 intrachromosomal counts processed.
Chromosome chr10 intrachromosomal counts processed.
Chromosome chr11 intrachromosomal counts processed.
Chromosome chr12 intrachromosomal counts processed.
Chromosome chr13 intrachromosomal counts processed.
Chromosome chr14 intrachromosomal counts processed.
Chromosome chr15 intrachromosomal counts processed.
Chromosome chr16 intrachromosomal counts processed.
Chromosome chr17 intrachromosomal counts processed.
Chromosome chr18 intrachromosomal counts processed.
Chromosome chr19 intrachromosomal counts processed.
Chromosome chr2 intrachromosomal counts processed.
Chromosome chr20 intrachromosomal counts processed.
Chromosome chr21 intrachromosomal counts processed.
Chromosome chr22 intrachromosomal counts processed.
Chromosome chr23 intrachromosomal counts processed.
Chromosome chr24 intrachromosomal counts processed.
Chromosome chr25 intrachromosomal counts processed.
Chromosome chr26 intrachromosomal counts processed.
Chromosome chr27 intrachromosomal counts processed.
Chromosome chr28 intrachromosomal counts processed.
Chromosome chr3 intrachromosomal counts processed.
Chromosome chr30 intrachromosomal counts processed.
Chromosome chr31 intrachromosomal counts processed.
Chromosome chr32 intrachromosomal counts processed.
Chromosome chr33 intrachromosomal counts processed.
Chromosome chr4 intrachromosomal counts processed.
Chromosome chr5 intrachromosomal counts processed.
Chromosome chr6 intrachromosomal counts processed.
Chromosome chr7 intrachromosomal counts processed.
Chromosome chr8 intrachromosomal counts processed.
Chromosome chr9 intrachromosomal counts processed.
Chromosome chrW intrachromosomal counts processed.
Chromosome chrZ intrachromosomal counts processed.
Chromosome chr1 complete.
Chromosome chr10 complete.
Chromosome chr11 complete.
Chromosome chr12 complete.
Chromosome chr13 complete.
Chromosome chr14 complete.
Chromosome chr15 complete.
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, :
NA/NaN/Inf in 'x'
Calls: HiCDCPlus -> HiCDCPlus_chr
Execution halted
When I run the code in Rstudio I don't get this error. I also don't get this error if I just run HiCDCPlus on 2 chromosomes on the command line. I've tried a couple of iterations and haven't been able to identify anything obviously wrong the the input ValidPairs data. Any suggestions to how I can fix this error welcome!
Here is the relevant part of the R script:
# Read in bins
gi_list<-generate_bintolen_gi_list(
bintolen_path=paste0(data_path, "rds_files/bins_bintolen.txt.gz"),
gen = "Ggallus", gen_ver = "galGal6")
# Check gi_list
gi_list_validate(gi_list) # passes without errors
head(gi_list)
# GInteractions object with 1457234 interactions and 1 metadata column:
# seqnames1 ranges1 seqnames2 ranges2 | D
# <Rle> <IRanges> <Rle> <IRanges> | <integer>
# [1] chr13 0-5000 --- chr13 0-5000 | 0
# [2] chr13 0-5000 --- chr13 5000-10000 | 5000
# [3] chr13 0-5000 --- chr13 10000-15000 | 10000
# [4] chr13 0-5000 --- chr13 15000-20000 | 15000
# [5] chr13 0-5000 --- chr13 20000-25000 | 20000
# Extract valid pair path (output from HiCPro)
valid_pair_path <- list.files(path = data_path, pattern = "*.allValidPairs", full.names = TRUE)
# Add Validpairs to gi_list
gi_list_with_valid_pairs <- add_hicpro_allvalidpairs_counts(gi_list, allvalidpairs_path = valid_pair_path)
# Check file now
gi_list_validate(gi_list_with_valid_pairs)
head(gi_list_with_valid_pairs)
# $chr1
# GInteractions object with 15768122 interactions and 2 metadata columns:
# seqnames1 ranges1 seqnames2 ranges2 | D counts
# <Rle> <IRanges> <Rle> <IRanges> | <integer> <numeric>
# [1] chr1 0-5000 --- chr1 0-5000 | 0 0
# [2] chr1 0-5000 --- chr1 5000-10000 | 5000 0
# [3] chr1 0-5000 --- chr1 10000-15000 | 10000 0
# [4] chr1 0-5000 --- chr1 15000-20000 | 15000 0
# [5] chr1 0-5000 --- chr1 20000-25000 | 20000 0
# ... ... ... ... ... ... . ... ...
# [15768118] chr1 197595000-197600000 --- chr1 197600000-197605000 | 5000 0
# [15768119] chr1 197595000-197600000 --- chr1 197605000-197608386 | 9193 0
# [15768120] chr1 197600000-197605000 --- chr1 197600000-197605000 | 0 4
# [15768121] chr1 197600000-197605000 --- chr1 197605000-197608386 | 4193 2
# [15768122] chr1 197605000-197608386 --- chr1 197605000-197608386 | 0 0
# -------
# regions: 39522 ranges and 2 metadata columns
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Expand features for modeling
expanded_gi_list_with_valid_pairs <- expand_1D_features(gi_list_with_valid_pairs)
# Run HiC-DC+
set.seed(1010)
expanded_gi_list_with_valid_pairs_HiCDC <- HiCDCPlus(expanded_gi_list_with_valid_pairs,
covariates = NULL,
distance_type = "spline",
model_distribution = "nb",
df = 6,
ssize = 0.01,
splineknotting = "uniform",
binned = TRUE,
Dmin = 10000,
Dmax = 1000000,
chrs = NULL
)
Hello Eva,
Looks like there is a convergence issue for the underlying chromosome: the spline fit for D is too noisy. I would recommend one of the following in your call to
HiCDCPlus(.)
in my order of preference:ssize
) by 50%, e.g., 0.01 to 0.015, more as neededdf
argument in your call toHiCDCPlus
),spline
tolog
Each of these methods seems to mitigate the error in the minimally replicable example you have provided me offline (
HiCDC_debugging_NFr2.txt
). Here is the code I've used to replicate the error on your example:Hope this helps!