Question

ChIPQC Error in .Call2 NCList object is too deep

1

Entering edit mode

Victor Goitea ▴ 10

@victor-goitea-14304

Last seen 7.2 years ago

United_States/Oklahoma_City/OMRF

Dear all,

I need help to solve this error, using the latest version of ChIPQC1.15. I am not bioinformatician. I would like to give the bam file to you can reproduce the problem but I don't know how to do with such a big file.

> samples = read.csv("samples_Ours_test0.csv")

SampleID Tissue Factor Treatment Replicate bamReads Peaks
1 File_ours D A T 1 File.bam NA

> Test0 = ChIPQC(samples,annotation="hg19")

Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

300 / 300
Counting reads in features for 1

Error in .Call2("new_NCListAsINTSXP_from_NCList", nclist_xp, PACKAGE = "IRanges") :
compute_NCListAsINTSXP_length: NCList object is too deep (has more than
100000 levels of nested ranges)
In addition: Warning messages:
1: In ChIPQC(samples, annotation = "hg19") :
No chromosomes specified in peaks, using all.
2: In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

If I run like this

> Test0 = ChIPQCsample(bamFile,annotation="hg19",peaks=NULL,chromosomes="chr11")

The report change to

Error in checkSlotAssignment(object, name, value) :
assignment of an object of class “IRanges” is not valid for slot ‘ranges’ in an object of class “ChIPQCsample”; is(value, "IRanges_OR_IPos") is not TRUE

I have some test samples where CHIPQC run OK but with my samples I got these errors. So I check for differences in the bam files and I found: success CHIPQC success bam files (align with bowtie, chromosome list "chr1"- "chr22", "chrX", "chrY", "chrMT" ), CHIPQC fail bam files (align with bwa, chromosome list 1-22", X, Y, MT, and several GL##). So the first thing I tried was to edit and filter my bam files, so they list the chromosome name as chr#, I also tried to get rid of the GL## by different methods, but I only could eliminate the reads information in that location( I tried a lot of way and all give me the same result). The header remain listing the 84 contig, I couldn't find a successful method on this and I don't know if it is the solution or the error is unrelated to this. I also tried to follow the custom genome annotation A: Unable to perform analysis on all the chromosomes with the latest version of ChI but it didn't work on my hands (maybe I'm doing a mistake listing all the chr and their ranges)

Here is the list of my original contigs obtained by:

awk '{print $1 "\t0\t" $2}' human_hg19.fa.fai > human_hg19.bed

1 0 249250621

2 0 243199373

3 0 198022430

4 0 191154276

5 0 180915260

6 0 171115067

7 0 159138663

8 0 146364022

9 0 141213431

10 0 135534747

11 0 135006516

12 0 133851895

13 0 115169878

14 0 107349540

15 0 102531392

16 0 90354753

17 0 81195210

18 0 78077248

19 0 59128983

20 0 63025520

21 0 48129895

22 0 51304566

X 0 155270560

Y 0 59373566

MT 0 16569

GL000207.1 0 4262

GL000226.1 0 15008

GL000229.1 0 19913

GL000231.1 0 27386

GL000210.1 0 27682

GL000239.1 0 33824

GL000235.1 0 34474

GL000201.1 0 36148

GL000247.1 0 36422

GL000245.1 0 36651

GL000197.1 0 37175

GL000203.1 0 37498

GL000246.1 0 38154

GL000249.1 0 38502

GL000196.1 0 38914

GL000248.1 0 39786

GL000244.1 0 39929

GL000238.1 0 39939

GL000202.1 0 40103

GL000234.1 0 40531

GL000232.1 0 40652

GL000206.1 0 41001

GL000240.1 0 41933

GL000236.1 0 41934

GL000241.1 0 42152

GL000243.1 0 43341

GL000242.1 0 43523

GL000230.1 0 43691

GL000237.1 0 45867

GL000233.1 0 45941

GL000204.1 0 81310

GL000198.1 0 90085

GL000208.1 0 92689

GL000191.1 0 106433

GL000227.1 0 128374

GL000228.1 0 129120

GL000214.1 0 137718

GL000221.1 0 155397

GL000209.1 0 159169

GL000218.1 0 161147

GL000220.1 0 161802

GL000213.1 0 164239

GL000211.1 0 166566

GL000199.1 0 169874

GL000217.1 0 172149

GL000216.1 0 172294

GL000215.1 0 172545

GL000205.1 0 174588

GL000219.1 0 179198

GL000224.1 0 179693

GL000223.1 0 180455

GL000195.1 0 182896

GL000212.1 0 186858

GL000222.1 0 186861

GL000200.1 0 187035

GL000193.1 0 189789

GL000194.1 0 191469

GL000225.1 0 211173

GL000192.1 0 547496

I also tried this:

> chromosomesToTest <- c(paste0("chr",c(1:22,"X","Y")))

> chromosomesToTest

Test0 = ChIPQC(samples,annotation="hg19",chromosomes=chromosomesToTest)

And still the error remains the same.

> traceback()
15: stop(e)
14: value[[3L]](cond)
13: tryCatchOne(expr, names, parentenv, handlers[[1L]])
12: tryCatchList(expr, classes, parentenv, handlers)
11: tryCatch({
FUN(...)
}, error = handle_error)
10: withCallingHandlers({
tryCatch({
FUN(...)
}, error = handle_error)
}, warning = handle_warning)
9: FUN(X[[i]], ...)
8: lapply(X, FUN, ...)
7: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
6: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
5: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
4: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
3: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
annotation, mapQCth, blacklist, profileWin, fragmentLength,
shifts)
2: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
annotation, mapQCth, blacklist, profileWin, fragmentLength,
shifts)
1: ChIPQC(samples, annotation = "hg19")
> Test0 = ChIPQC(samples,annotation="hg

ChIPQC • 2.5k views

ADD COMMENT • link updated 7.2 years ago by Thomas Carroll ▴ 420 • written 7.2 years ago by Victor Goitea ▴ 10

0

Entering edit mode

Please add the output of sessionInfo(), and be sure to use up-to-date versions of R (3.4.2) and Bioconductor (3.6) with no package conflicts (BiocInstaller::biocValid() returns TRUE).

ADD REPLY • link 7.2 years ago Martin Morgan 25k

0

Entering edit mode

sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ChIPQC_1.15.0 DiffBind_2.6.0 SummarizedExperiment_1.8.0
[4] DelayedArray_0.4.0 matrixStats_0.52.2 Biobase_2.38.0
[7] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 IRanges_2.12.0
[10] S4Vectors_0.16.0 BiocGenerics_0.24.0 ggplot2_2.2.1

* Packages too new for Bioconductor version '3.6'

Version LibPath
ChIPQC "1.15.0" "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"

downgrade with biocLite("ChIPQC")

Error: 1 package(s) too new

After downgrade biocLite("ChIPQC")

> BiocInstaller::biocValid()
[1] TRUE

The error still remain.

I found out that the program run well if I specify to run just in one chromosome. Also, it doesn't matter how is the name of the chromosomes as much it match in the specification of the command or values definition. Also I could get rid of the other contigs and only keep chr 1-22,X,Y. But the error remains being the same

> chromosomesToTest1 <- c(paste0(c(1)))

> exampleExp1 = ChIPQC(samples2,annotation="hg19",chromosomes=chromosomesToTest1)

Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 24 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

300 / 300
Counting reads in features for 1

[1] 1
Warning message:
In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

But, here is with two chromosomes and starts to fail:

> chromosomesToTest2 <- c(paste0(c(1:2)))

> exampleExp2 = ChIPQC(samples2,annotation="hg19",chromosomes=chromosomesToTest2)

Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 24 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

300 / 300
Counting reads in features for 1

Error in .Call2("new_NCListAsINTSXP_from_NCList", nclist_xp, PACKAGE = "IRanges") :
compute_NCListAsINTSXP_length: NCList object is too deep (has more than
100000 levels of nested ranges)
In addition: Warning message:
In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

ADD REPLY • link 7.2 years ago Victor Goitea ▴ 10

0

Entering edit mode

Thomas Carroll ▴ 420

@thomas-carroll-7019

Last seen 2.2 years ago

United States/New York/The Rockefeller …

hi,

I can see a few problems here.

>* Packages too new for Bioconductor version '3.6'

> Version LibPath

> ChIPQC "1.15.0" "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"

> downgrade with biocLite("ChIPQC")

The output from sessionInfo appears to be telling us that ChIPQC version is out of line with your version of Bioconductor. Mixing up packages from different Bioconductor versions may cause some problems here so I recommend using an appropriate ChIPQC version. It tells you as much in sessionInfo() output.

Your command to output chromosome names seems to be show your chromosomes are numbers 1-22 and X,Y but the R command you use will look for chr1-chr22 and chrX,chrY. This may explain different performance with Bowtie aligned working and BWA aligned not, although the troubles looks to do with the different genome build and its contig names rather than the choice of aligner.

You could try below on bwa files.

> chromosomesToTest <- c(1:22,"X","Y")

Also to get a similar output of chromosome names and lengths from fasta in R, and avoid using awk, you could try

> seqinfo(FaFile("hg19.fasta"))
Seqinfo object with 25 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr1 249250621 <NA> <NA>
chr2 243199373 <NA> <NA>
chr3 198022430 <NA> <NA>
chr4 191154276 <NA> <NA>
chr5 180915260 <NA> <NA>
... ... ... ...
chr21 48129895 <NA> <NA>
chr22 51304566 <NA> <NA>
chrX 155270560 <NA> <NA>
chrY 59373566 <NA> <NA>
chrM 16571 <NA> <NA>

If you are still seeing problems after fixing package version issues and chromosome names I can look again.

tom

ADD COMMENT • link 7.2 years ago Thomas Carroll ▴ 420

0

Entering edit mode

Victor Goitea ▴ 10

@victor-goitea-14304

Last seen 7.2 years ago

United_States/Oklahoma_City/OMRF

Hi Thomas,

I downgrade ChIPQC with biocLite("ChIPQC"). See the sessionInfo() and BiocInstaller::biocValid() . However, it doesn't change the behavior respect to the error.

> BiocInstaller::biocValid()
[1] TRUE

> sessionInfo()

R version 3.4.2 (2017-09-28)

Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] Rsamtools_1.30.0 Biostrings_2.46.0 XVector_0.18.0
[4] BiocInstaller_1.28.0 ChIPQC_1.14.0 DiffBind_2.6.0
[7] SummarizedExperiment_1.8.0 DelayedArray_0.4.0 matrixStats_0.52.2
[10] Biobase_2.38.0 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
[13] IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
[16] ggplot2_2.2.1

ADD COMMENT • link 7.2 years ago Victor Goitea ▴ 10

0

Entering edit mode

I show here how the program still fail over ALL chromosomes (1:22,"X","Y") but it works running in only one chromosome. When I try two chromosomes, it depends on the chromosomes that I choose. For example, with chromosomes 21 and 22 it works but using chromosomes 1 and 2 it fails with the same error message posted at the beginning.

> seqinfo(FaFile("human_g1k_v37.fasta"))
Seqinfo object with 84 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
1 249250621 <NA> <NA>
2 243199373 <NA> <NA>
3 198022430 <NA> <NA>
4 191154276 <NA> <NA>
5 180915260 <NA> <NA>
... ... ... ...
GL000200.1 187035 <NA> <NA>
GL000193.1 189789 <NA> <NA>
GL000194.1 191469 <NA> <NA>
GL000225.1 211173 <NA> <NA>
GL000192.1 547496 <NA> <NA>

>samples = read.csv("samples_Ours_test0.csv")
> samples
SampleID Tissue Factor Treatment Replicate bamReads Peaks
1 A2_D D A2 T 1 File.bam NA
> chromosomesToTest <- c(1:22,"X","Y")

> Test1 = ChIPQC(samples,annotation="hg19",chromosomes=chromosomesToTest)

Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

300 / 300
Counting reads in features for 1

Error in .Call2("new_NCListAsINTSXP_from_NCList", nclist_xp, PACKAGE = "IRanges") :
compute_NCListAsINTSXP_length: NCList object is too deep (has more than
100000 levels of nested ranges)
In addition: Warning message:
In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

Limiting the analysis to chr1

> Test2 = ChIPQC(samples,annotation="hg19",chromosomes="1")

Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 1

Calculating SSD for 1

Calculating unique positions per strand for 1

Calculating shift for 1

300 / 300
Counting reads in features for 1

[1] 1
Warning message:
In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

ADD REPLY • link 7.2 years ago Victor Goitea ▴ 10

0

Entering edit mode

Two chromosomes "21"and "22"

> Test2
Samples: 1 : A2_D
Tissue Factor Treatment Replicate
D A2 T 1
Reads Map% Filt% Dup% ReadL FragL RelCC SSD RiP% RiBL%
A2_D 8856458 100 10.3 0 47 95 1.12 24.9 NA 0

> chromosomesToTest2 <- c("21","22")
> Test3 = ChIPQC(samples,annotation="hg19",chromosomes=chromosomesToTest2)

G6-1_Arid2 DIVA_Brg1KO Arid2 Tamoxifen 1 bed
Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 1 samples...
list
Bam file has 84 contigs
Calculating coverage histogram for 21

Calculating SSD for 21

Calculating unique positions per strand for 21

Calculating shift for 21

300 / 300
Counting reads in features for 21

Calculating coverage histogram for 22

Calculating SSD for 22

Calculating unique positions per strand for 22

Calculating shift for 22

300 / 300
Counting reads in features for 22

[1] 1
[1] 1
Warning message:
In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

Two chromosomes "1"and "2"

> chromosomesToTest2 <- c("1","2")

> Test4 = ChIPQC(samples,annotation="hg19",chromosomes=chromosomesToTest2)

....

Error in .Call2("new_NCListAsINTSXP_from_NCList", nclist_xp, PACKAGE = "IRanges") :
compute_NCListAsINTSXP_length: NCList object is too deep (has more than
100000 levels of nested ranges)
In addition: Warning message:
In if is.na(blacklist)) blacklist = NULL :
the condition has length > 1 and only the first element will be used

> traceback()
15: stop(e)
14: value[[3L]](cond)
13: tryCatchOne(expr, names, parentenv, handlers[[1L]])
12: tryCatchList(expr, classes, parentenv, handlers)
11: tryCatch({
FUN(...)
}, error = handle_error)
10: withCallingHandlers({
tryCatch({
FUN(...)
}, error = handle_error)
}, warning = handle_warning)
9: FUN(X[[i]], ...)
8: lapply(X, FUN, ...)
7: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
6: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
5: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
4: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
3: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
annotation, mapQCth, blacklist, profileWin, fragmentLength,
shifts)
2: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
annotation, mapQCth, blacklist, profileWin, fragmentLength,
shifts)
1: ChIPQC(samples, annotation = "hg19", chromosomes = chromosomesToTest2)

ADD REPLY • link 7.2 years ago Victor Goitea ▴ 10

0

Entering edit mode

Thomas Carroll ▴ 420

@thomas-carroll-7019

Last seen 2.2 years ago

United States/New York/The Rockefeller …

Okay, I think I will need to take a look at Bam file to debug.

Could you share the Bam file with me by Dropbox or something similar? You can send me link at tc.infomatics@gmail.com and we can post solution here if it is useful to others.

Vb,

tom

ADD COMMENT • link 7.2 years ago Thomas Carroll ▴ 420

0

Entering edit mode

Victor Goitea ▴ 10

@victor-goitea-14304

Last seen 7.2 years ago

United_States/Oklahoma_City/OMRF

I shared the files by dropbox. Did you get the link? Let me know if you need something else or if there is some problem with the Dropbox. Thank you for take a look.

Best,

Victor.

ADD COMMENT • link 7.2 years ago Victor Goitea ▴ 10

score 2 · Accepted Answer · 2017-11-07

Hi Thomas,

The current implementation of NCList and GNCList objects imposes a hard-coded limit of 100000 nested ranges. Note that you hit this limit only when you build an NCList or GNCList object by explicitly calling the NCList() or GNCList() constructor. Most of the time this is not necessary though, and is actually not recommended (see ?NCList and ?GNCList for some details about this).

So in ChIPQC/R/sampleQC.R, replacing line 111:

Sample_GIT <- GNCList(GRanges(seqnames=seqnames(temp),ranges=ranges(temp),strand=strand(temp),elementMetadata(temp)))

with:

Sample_GIT <- GRanges(seqnames=seqnames(temp),ranges=ranges(temp),strand=strand(temp),elementMetadata(temp))

should make the problem go away.

Cheers,

H.

score 2 · Accepted Answer · 2017-11-07

2

Entering edit mode

Thomas Carroll ▴ 420

@thomas-carroll-7019

Last seen 2.2 years ago

United States/New York/The Rockefeller …

hi Victor, hi Herve,

Yes, it looks like on chromosome 2 you have an area over 2 millions reads deep (overlapping reads) which means ChIPQC breaches the limit of 100000 set in the way ChIPQC creates the GNCList object.

I will adopt the change suggested by Herve tonight and it should be in the release version of ChIPQC soon.

best,

tom

ADD COMMENT • link 7.2 years ago Thomas Carroll ▴ 420

0

Entering edit mode

Hi Tom, hi Herve,

Thank you very much for solve this problem and for the explanation.

Best,

Victor

ADD REPLY • link 7.2 years ago Victor Goitea ▴ 10