HiCDCPlus glm.fit error
0
0
Entering edit mode
ehamrud • 0
@b020a3bb
Last seen 18 months ago
United Kingdom

Hello,

I am running HiCDCPlus on valid pairs obtained from HiCPro. I have created an R script to do this and then I am running the script from the commandline on my university's HPC. When I do this, 3 of my 6 samples fail with the follow error:

Loading required package: rtracklayer
  Chromosome chr1 intrachromosomal counts processed.
  Chromosome chr10 intrachromosomal counts processed.
  Chromosome chr11 intrachromosomal counts processed.
  Chromosome chr12 intrachromosomal counts processed.
  Chromosome chr13 intrachromosomal counts processed.
  Chromosome chr14 intrachromosomal counts processed.
  Chromosome chr15 intrachromosomal counts processed.
  Chromosome chr16 intrachromosomal counts processed.
  Chromosome chr17 intrachromosomal counts processed.
  Chromosome chr18 intrachromosomal counts processed.
  Chromosome chr19 intrachromosomal counts processed.
  Chromosome chr2 intrachromosomal counts processed.
  Chromosome chr20 intrachromosomal counts processed.
  Chromosome chr21 intrachromosomal counts processed.
  Chromosome chr22 intrachromosomal counts processed.
  Chromosome chr23 intrachromosomal counts processed.
  Chromosome chr24 intrachromosomal counts processed.
  Chromosome chr25 intrachromosomal counts processed.
  Chromosome chr26 intrachromosomal counts processed.
  Chromosome chr27 intrachromosomal counts processed.
  Chromosome chr28 intrachromosomal counts processed.
  Chromosome chr3 intrachromosomal counts processed.
  Chromosome chr30 intrachromosomal counts processed.
  Chromosome chr31 intrachromosomal counts processed.
  Chromosome chr32 intrachromosomal counts processed.
  Chromosome chr33 intrachromosomal counts processed.
  Chromosome chr4 intrachromosomal counts processed.
  Chromosome chr5 intrachromosomal counts processed.
  Chromosome chr6 intrachromosomal counts processed.
  Chromosome chr7 intrachromosomal counts processed.
  Chromosome chr8 intrachromosomal counts processed.
  Chromosome chr9 intrachromosomal counts processed.
  Chromosome chrW intrachromosomal counts processed.
  Chromosome chrZ intrachromosomal counts processed.
  Chromosome chr1 complete.
  Chromosome chr10 complete.
  Chromosome chr11 complete.
  Chromosome chr12 complete.
  Chromosome chr13 complete.
  Chromosome chr14 complete.
  Chromosome chr15 complete.
  Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
    NA/NaN/Inf in 'x'
  Calls: HiCDCPlus -> HiCDCPlus_chr
  Execution halted

When I run the code in Rstudio I don't get this error. I also don't get this error if I just run HiCDCPlus on 2 chromosomes on the command line. I've tried a couple of iterations and haven't been able to identify anything obviously wrong the the input ValidPairs data. Any suggestions to how I can fix this error welcome!

Here is the relevant part of the R script:


# Read in bins
gi_list<-generate_bintolen_gi_list(
  bintolen_path=paste0(data_path, "rds_files/bins_bintolen.txt.gz"),
  gen = "Ggallus", gen_ver = "galGal6")

# Check gi_list
gi_list_validate(gi_list) # passes without errors
head(gi_list)
# GInteractions object with 1457234 interactions and 1 metadata column:
#   seqnames1           ranges1     seqnames2           ranges2 |         D
# <Rle>         <IRanges>         <Rle>         <IRanges> | <integer>
#   [1]     chr13            0-5000 ---     chr13            0-5000 |         0
# [2]     chr13            0-5000 ---     chr13        5000-10000 |      5000
# [3]     chr13            0-5000 ---     chr13       10000-15000 |     10000
# [4]     chr13            0-5000 ---     chr13       15000-20000 |     15000
# [5]     chr13            0-5000 ---     chr13       20000-25000 |     20000

# Extract valid pair path (output from HiCPro)
valid_pair_path <- list.files(path = data_path, pattern = "*.allValidPairs", full.names = TRUE)

# Add Validpairs to gi_list
gi_list_with_valid_pairs <- add_hicpro_allvalidpairs_counts(gi_list, allvalidpairs_path = valid_pair_path)

# Check file now
gi_list_validate(gi_list_with_valid_pairs)
head(gi_list_with_valid_pairs)
# $chr1
# GInteractions object with 15768122 interactions and 2 metadata columns:
#   seqnames1             ranges1     seqnames2             ranges2 |         D    counts
# <Rle>           <IRanges>         <Rle>           <IRanges> | <integer> <numeric>
#   [1]      chr1              0-5000 ---      chr1              0-5000 |         0         0
# [2]      chr1              0-5000 ---      chr1          5000-10000 |      5000         0
# [3]      chr1              0-5000 ---      chr1         10000-15000 |     10000         0
# [4]      chr1              0-5000 ---      chr1         15000-20000 |     15000         0
# [5]      chr1              0-5000 ---      chr1         20000-25000 |     20000         0
# ...       ...                 ... ...       ...                 ... .       ...       ...
# [15768118]      chr1 197595000-197600000 ---      chr1 197600000-197605000 |      5000         0
# [15768119]      chr1 197595000-197600000 ---      chr1 197605000-197608386 |      9193         0
# [15768120]      chr1 197600000-197605000 ---      chr1 197600000-197605000 |         0         4
# [15768121]      chr1 197600000-197605000 ---      chr1 197605000-197608386 |      4193         2
# [15768122]      chr1 197605000-197608386 ---      chr1 197605000-197608386 |         0         0
# -------
#   regions: 39522 ranges and 2 metadata columns
# seqinfo: 1 sequence from an unspecified genome; no seqlengths

# Expand features for modeling
expanded_gi_list_with_valid_pairs <- expand_1D_features(gi_list_with_valid_pairs)

# Run HiC-DC+
set.seed(1010) 

expanded_gi_list_with_valid_pairs_HiCDC <- HiCDCPlus(expanded_gi_list_with_valid_pairs,
                                                     covariates = NULL,
                                                     distance_type = "spline",
                                                     model_distribution = "nb",
                                                     df = 6,
                                                     ssize = 0.01,
                                                     splineknotting = "uniform",
                                                     binned = TRUE,
                                                     Dmin = 10000,
                                                     Dmax = 1000000,
                                                     chrs = NULL
                                                     )
HiCDCPlus • 666 views
ADD COMMENT
0
Entering edit mode

Hello Eva,

Looks like there is a convergence issue for the underlying chromosome: the spline fit for D is too noisy. I would recommend one of the following in your call to HiCDCPlus(.) in my order of preference:

  1. increasing the sampling ratio (ssize) by 50%, e.g., 0.01 to 0.015, more as needed
  2. reducing the degrees of freedom in the spline (the df argument in your call to HiCDCPlus),
  3. changing distance type from spline to log

Each of these methods seems to mitigate the error in the minimally replicable example you have provided me offline (HiCDC_debugging_NFr2.txt). Here is the code I've used to replicate the error on your example:

require(HiCDCPlus)
require(BSgenome.Ggallus.UCSC.galGal6)
require(dplyr)
construct_features(output_path="galGal6",
                   gen = "Ggallus", gen_ver = "galGal6",
                   sig=c("GATC"),bin_type="Bins-uniform",
                   binsize=5000,
                   wg_file=NULL, 
                   chrs=c("chr30"))
gi_list<-generate_bintolen_gi_list(
  bintolen_path="galGal6_bintolen.txt.gz",
  gen = "Ggallus", gen_ver = "galGal6")
gi_list<-expand_1D_features(gi_list)
df<-read.csv('HiCDC_debugging_NFr2.txt',sep='\t')
df['chr']<-"chr30"
gi_list[['chr30']]<-add_2D_features(gi_list[['chr30']],df[c('chr','startI','startJ','counts')],features='counts')
set.seed(1010)
gi_HiCDC<-HiCDCPlus_chr(gi_list[['chr30']],
              covariates = NULL,
              distance_type = "spline", #You can change this to 'log'
              model_distribution = "nb",
              df = 6, #You can decrease this to 5
              ssize = 0.01, #You can increase here to, say, 0.015
              splineknotting = "uniform",
              binned = TRUE,
              Dmin = 10000,
              Dmax = 1000000)

Hope this helps!

ADD REPLY

Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6