FourCseq - error countFragmentOverlaps()
4
0
Entering edit mode
Theo ▴ 10
@theodoregeorgomanolis-7993
Last seen 5 months ago
Germany

Dear Felix

I got the following problem

> fc <- countFragmentOverlaps(fc)
reading bam files
calculating overlaps
Error in .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,  :
  extend_stack: cannot build an NCList object of depth >= 25000


I have read the answered question of another user (12 weeks ago) I included the minMapq=-1 error but still I got the same error
 

> fc <- countFragmentOverlaps(fc,minMapq=-1)
reading bam files
calculating overlaps
Error in .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,  :
  extend_stack: cannot build an NCList object of depth >= 25000

I am using R 3.2 bioconductor devel version
I am mapping with bowtie mem and using samtools and using the following command to extract uniqly mapped sequences:
samtools view -h -q 1 $1/$NOEXT1".bam" | grep -v "XA:Z" > $1/$NOEXT1"_uniq.sam"

I am posting a detailed walkthrough"

 

> referenceGenomeFile<-"/media/trotos/4TB/WORK/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa"

> bamFilePath<-"/media/trotos/4TB/WORK/archive/to_be_mapped"

> primerFile<-"/media/trotos/4TB/WORK/archive/to_be_mapped/primers"

> metaData19 <- list(projectPath = "/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30", fragmentDir = "/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30", referenceGenomeFile=referenceGenomeFile19, reSequence1 = "RAATTY",reSequence2 = "GATC",primerFile = primerFile,bamFilePath = bamFilePath)

> colData <- DataFrame(viewpoint = "Smad4a",condition = factor(rep(c("Q30","Q30_c1","Q30_uniq_srt"),each=2),levels = c("Q30","Q30_c1","Q30_uniq_srt")), replicate=rep(c(1,2),3), bamFile = c("SAMD4a_0min_rep2_mem_sorted_Q30.bam","SAMD4a_0min_rep3_mem_sorted_Q30.bam","SAMD4a_0min_rep2_mem_sorted_Q30_c1.bam","SAMD4a_0min_rep3_mem_sorted_Q30_c1.bam","SAMD4a_0min_rep2_mem_sorted_Q30_uniq_srt.bam","SAMD4a_0min_rep3_mem_sorted_Q30_uniq_srt.bam"),sequencingPrimer="first")

> colData
DataFrame with 6 rows and 5 columns
    viewpoint    condition replicate
  <character>     <factor> <numeric>
1      Smad4a          Q30         1
2      Smad4a          Q30         2
3      Smad4a       Q30_c1         1
4      Smad4a       Q30_c1         2
5      Smad4a Q30_uniq_srt         1
6      Smad4a Q30_uniq_srt         2
                                       bamFile sequencingPrimer
                                   <character>      <character>
1          SAMD4a_0min_rep2_mem_sorted_Q30.bam            first
2          SAMD4a_0min_rep3_mem_sorted_Q30.bam            first
3       SAMD4a_0min_rep2_mem_sorted_Q30_c1.bam            first
4       SAMD4a_0min_rep3_mem_sorted_Q30_c1.bam            first
5 SAMD4a_0min_rep2_mem_sorted_Q30_uniq_srt.bam            first
6 SAMD4a_0min_rep3_mem_sorted_Q30_uniq_srt.bam            first

> fc<-FourC(colData,metaData19)
> fc
class: FourC
dim: 0 6
metadata(7): projectPath fragmentDir ... primerFile bamFilePath
assays(0):
rownames: NULL
rowRanges metadata column names(0):
colnames(6): Smad4a_Q30_1 Smad4a_Q30_2 ... Smad4a_Q30_uniq_srt_1
  Smad4a_Q30_uniq_srt_2
colData names(5): viewpoint condition replicate bamFile
  sequencingPrimer

fc<-addFragments(fc)

> rowRanges(fc)
GRanges object with 2935389 ranges and 4 metadata columns:
            seqnames               ranges strand   |  leftSize rightSize
               <Rle>            <IRanges>  <Rle>   | <numeric> <numeric>
        [1]     chrM         [   1,  288]      *   |         0       284
        [2]     chrM         [ 403, 2051]      *   |       339       819
        [3]     chrM         [2162, 2804]      *   |       189       450
        [4]     chrM         [2811, 3111]      *   |        87        43
        [5]     chrM         [3390, 4121]      *   |       270       424
        ...      ...                  ...    ... ...       ...       ...
  [2935385]     chrY [59351339, 59352173]      *   |       440       253
  [2935386]     chrY [59352180, 59353908]      *   |       110      1403
  [2935387]     chrY [59353915, 59356216]      *   |       301      1439
  [2935388]     chrY [59356223, 59360938]      *   |       763       211
  [2935389]     chrY [59360996, 59373566]      *   |       570     11997
            leftValid rightValid
            <logical>  <logical>
        [1]         0          1
        [2]         1          1
        [3]         1          1
        [4]         1          1
        [5]         1          1
        ...       ...        ...
  [2935385]         1          1
  [2935386]         1          1
  [2935387]         1          1
  [2935388]         1          1
  [2935389]         1          1
  -------
  seqinfo: 25 sequences from an unspecified genome #is that a problem????

> colData(fc)$chr = "chr14"
> colData(fc)$start =54565826
> colData(fc)$end =54566537
> fc <- countFragmentOverlaps(fc)

reading bam files
calculating overlaps
Error in .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,  :
  extend_stack: cannot build an NCList object of depth >= 25000

> traceback()
11: .Call(.NAME, ..., PACKAGE = PACKAGE)
10: .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,
        q_groups, start(s), end(s), s_space, s_groups, nclists, nclist_is_q,
        min.score, type, select, circle.length, PACKAGE = "IRanges")
9: IRanges:::NCList_find_overlaps_in_groups(ranges(query), q_space,
       q_groups, ranges(subject), s_space, s_groups, nclists, nclist_is_q,
       min.score, type, select, circle_length)
8: findOverlaps_GNCList(query, subject, min.score = min.score, type = type,
       select = "count", ignore.strand = ignore.strand)
7: countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,
       minoverlap = minoverlap, type = type, ignore.strand = ignore.strand)
6: .local(query, subject, maxgap, minoverlap, type, algorithm, ...)
5: FUN(X[[i]], ...)
4: FUN(X[[i]], ...)
3: lapply(X = X, FUN = FUN, ...)
2: sapply(reads, countOverlaps, query = frag, type = c("start"),
       maxgap = shift)
1: countFragmentOverlaps(fc)

> fc <- countFragmentOverlaps(fc,minMapq=-1)
reading bam files
calculating overlaps
Error in .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,  :
  extend_stack: cannot build an NCList object of depth >= 25000
FourCseq fourcseq • 4.3k views
ADD COMMENT
0
Entering edit mode

More details on the error

> debug(countFragmentOverlaps)
> countFragmentOverlaps(fc)
debugging in: countFragmentOverlaps(fc)
debug: {
    stopifnot(class(object) == "FourC")
    if (length(rowRanges(object)) == 0)
        stop("Add fragments before calling 'findViewpointFragments'")
    cat("reading bam files\n")
    bamFiles = file.path(metadata(object)$bamFilePath, colData(object)$bamFile)
    colData(object)$originalReads = sapply(bamFiles, function(bamFile) countBam(bamFile)$records)
    reads = lapply(bamFiles, function(bamfile) {
        what <- c("mapq")
        flag <- scanBamFlag(isUnmappedQuery = FALSE)
        param <- ScanBamParam(what = what, flag = flag)
        readGAlignments(bamfile, param = param)
    })
    colData(object)$rawReads = sapply(reads, length)
    if (minMapq >= 0) {
        reads = lapply(reads, function(rga, minMapq) {
            if (!any(is.na(mcols(rga)$mapq))) {
                return(rga[mcols(rga)$mapq > minMapq])
            }
            warning("mapq quality filter could not be applied due to NA values in the mapq quality scores. All reads are used for counting overlaps.")
            rga
        }, minMapq = minMapq)
        colData(object)$lowQualityReads = colData(object)$rawReads -
            sapply(reads, length)
    }
    reads = lapply(reads, granges)
    if (trim > 0) {
        reads = lapply(reads, function(gr, trim) {
            start(gr[strand(gr) == "+"]) = start(gr[strand(gr) ==
                "+"]) + trim
            end(gr[strand(gr) == "-"]) = end(gr[strand(gr) ==
                "-"]) - trim
            gr
        }, trim = trim)
    }
    cat("calculating overlaps\n")
    frag <- rowRanges(object)
    strand(frag) <- "+"
    countsLeftFragmentEnd <- sapply(reads, countOverlaps, query = frag,
        type = c("start"), maxgap = shift)
    strand(frag) <- "-"
    countsRightFragmentEnd <- sapply(reads, countOverlaps, query = frag,
        type = c("end"), maxgap = shift)
    colData(object)$mappedReads = apply(countsLeftFragmentEnd,
        2, sum) + apply(countsRightFragmentEnd, 2, sum)
    colData(object)$mappingRatio = colData(object)$mappedReads/(colData(object)$rawReads -
        colData(object)$lowQualityReads)
    metaDataFrame <- DataFrame(type = rep("countInfo", 5), description = rep("",
        5))
    idx <- colnames(colData(object)) %in% c("originalReads",
        "rawReads", "lowQualityReads", "mappedReads", "mappingRatio")
    mcols(colData(object))[idx, ] <- metaDataFrame
    assays(object) <- SimpleList(countsLeftFragmentEnd = countsLeftFragmentEnd,
        countsRightFragmentEnd = countsRightFragmentEnd)
    assays(object) <- SimpleList(countsLeftFragmentEnd = countsLeftFragmentEnd,
        countsRightFragmentEnd = countsRightFragmentEnd)
    object
}
Browse[2]>
debug: stopifnot(class(object) == "FourC")
Browse[2]>
debug: if (length(rowRanges(object)) == 0) stop("Add fragments before calling 'findViewpointFragments'")
Browse[2]>
debug: cat("reading bam files\n")
Browse[2]>
reading bam files
debug: bamFiles = file.path(metadata(object)$bamFilePath, colData(object)$bamFile)
Browse[2]>
debug: colData(object)$originalReads = sapply(bamFiles, function(bamFile) countBam(bamFile)$records)
Browse[2]>

debug: reads = lapply(bamFiles, function(bamfile) {
    what <- c("mapq")
    flag <- scanBamFlag(isUnmappedQuery = FALSE)
    param <- ScanBamParam(what = what, flag = flag)
    readGAlignments(bamfile, param = param)
})
Browse[2]>
debug: colData(object)$rawReads = sapply(reads, length)
Browse[2]>
debug: if (minMapq >= 0) {
    reads = lapply(reads, function(rga, minMapq) {
        if (!any(is.na(mcols(rga)$mapq))) {
            return(rga[mcols(rga)$mapq > minMapq])
        }
        warning("mapq quality filter could not be applied due to NA values in the mapq quality scores. All reads are used for counting overlaps.")
        rga
    }, minMapq = minMapq)
    colData(object)$lowQualityReads = colData(object)$rawReads -
        sapply(reads, length)
}
Browse[2]>
debug: reads = lapply(reads, function(rga, minMapq) {
    if (!any(is.na(mcols(rga)$mapq))) {
        return(rga[mcols(rga)$mapq > minMapq])
    }
    warning("mapq quality filter could not be applied due to NA values in the mapq quality scores. All reads are used for counting overlaps.")
    rga
}, minMapq = minMapq)
Browse[2]>
debug: colData(object)$lowQualityReads = colData(object)$rawReads -
    sapply(reads, length)
Browse[2]>
debug: reads = lapply(reads, granges)
Browse[2]>
debug: if (trim > 0) {
    reads = lapply(reads, function(gr, trim) {
        start(gr[strand(gr) == "+"]) = start(gr[strand(gr) ==
            "+"]) + trim
        end(gr[strand(gr) == "-"]) = end(gr[strand(gr) == "-"]) -
            trim
        gr
    }, trim = trim)
}
Browse[2]>
debug: cat("calculating overlaps\n")
Browse[2]>
calculating overlaps
debug: frag <- rowRanges(object)
Browse[2]>
debug: strand(frag) <- "+"
Browse[2]>
debug: countsLeftFragmentEnd <- sapply(reads, countOverlaps, query = frag,
    type = c("start"), maxgap = shift)
Browse[2]>
Error in .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,  :
  extend_stack: cannot build an NCList object of depth >= 25000

 

And My SessionInfo


> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] FourCSeq_1.3.2             LSD_3.0                   
 [3] DESeq2_1.9.9               RcppArmadillo_0.5.100.1.0
 [5] Rcpp_0.11.6                SummarizedExperiment_0.1.5
 [7] Biobase_2.29.1             ggplot2_1.0.1             
 [9] GenomicRanges_1.21.12      GenomeInfoDb_1.5.6        
[11] IRanges_2.3.8              S4Vectors_0.7.3           
[13] BiocGenerics_0.15.1       

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1            biovizBase_1.17.1        
 [3] lattice_0.20-31           gtools_3.4.2             
 [5] Rsamtools_1.21.5          Biostrings_2.37.2        
 [7] digest_0.6.8              plyr_1.8.2               
 [9] futile.options_1.0.0      acepack_1.3-3.3          
[11] RSQLite_1.0.0             BiocInstaller_1.19.5     
[13] zlibbioc_1.15.0           GenomicFeatures_1.21.7   
[15] annotate_1.47.0           rpart_4.1-9              
[17] Matrix_1.2-0              ggbio_1.17.1             
[19] proto_0.3-10              BiocParallel_1.3.13      
[21] geneplotter_1.47.0        stringr_1.0.0            
[23] foreign_0.8-63            RCurl_1.95-4.6           
[25] biomaRt_2.25.1            munsell_0.4.2            
[27] rtracklayer_1.29.6        nnet_7.3-9               
[29] gridExtra_0.9.1           Hmisc_3.16-0             
[31] XML_3.98-1.1              reshape_0.8.5            
[33] GenomicAlignments_1.5.9   MASS_7.3-40              
[35] bitops_1.0-6              RBGL_1.45.1              
[37] grid_3.2.0                xtable_1.7-4             
[39] GGally_0.5.0              gtable_0.1.2             
[41] DBI_0.3.1                 magrittr_1.5             
[43] scales_0.2.4              graph_1.47.0             
[45] stringi_0.4-1             XVector_0.9.1            
[47] reshape2_1.4.1            genefilter_1.51.0        
[49] latticeExtra_0.6-26       futile.logger_1.4.1      
[51] Formula_1.2-1             lambda.r_1.1.7           
[53] RColorBrewer_1.1-2        tools_3.2.0              
[55] dichromat_2.0-0           OrganismDbi_1.11.17      
[57] BSgenome_1.37.1           survival_2.38-1          
[59] AnnotationDbi_1.31.9      colorspace_1.2-6         
[61] cluster_2.0.1             fda_2.4.4                
[63] VariantAnnotation_1.15.10
ADD REPLY
0
Entering edit mode

and final my bam files

 

I also pasting the first 30 lines of the sam files

trotos@stupitMint:~$ samtools view -h '/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30/SAMD4a_0min_rep2_mem_sorted_Q30.bam' | head -n 30
@HD    VN:1.3    SO:coordinate
@SQ    SN:chrM    LN:16571
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chr10    LN:135534747
@SQ    SN:chr11    LN:135006516
@SQ    SN:chr12    LN:133851895
@SQ    SN:chr13    LN:115169878
@SQ    SN:chr14    LN:107349540
@SQ    SN:chr15    LN:102531392
@SQ    SN:chr16    LN:90354753
@SQ    SN:chr17    LN:81195210
@SQ    SN:chr18    LN:78077248
@SQ    SN:chr19    LN:59128983
@SQ    SN:chr20    LN:63025520
@SQ    SN:chr21    LN:48129895
@SQ    SN:chr22    LN:51304566
@SQ    SN:chrX    LN:155270560
@SQ    SN:chrY    LN:59373566
@PG    ID:bwa    PN:bwa    VN:0.7.12-r1039    CL:bwa mem -t 8 -M -k 19 -r 1 /media/trotos/4TB/WORK/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa /media/trotos/4TB/WORK/archive/fastq_files/I15-1087-01-0_SAMD4A_rep2.fastq.trimmed
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2101:14356:88606    0    chrM    1435    15    51M    *    0    0    AAACTGAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACAC    JJJJJJJDGHHD?FHFGHIIJJHIIIEHIJIIJGIHHHGFFFDDDDDDDDB    NM:i:0    MD:Z:51    AS:i:51    XS:i:46    XA:Z:chr11,-10530923,51M,1;chr5,-79947054,51M,2;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2114:11502:15382    0    chr1    74785    051M    *    0    0    ATTTTCTGGGTTGCCTGTTCACTCTGATGGTAGTTTCTTTTGCTGTGCAGA    FHJJJIIHIJIIJIIHIJJJJIIIHIBFHG@FHGIIIJJJHHHHHHF@CEF    NM:i:0    MD:Z:51AS:i:51    XS:i:51
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1106:7766:9986    16    chr1    75461    0    49M    *    0    0    GCCAAAGACAAAAACCACATGATTATCTCAATAGATGCAGAAAAGGCCT    DEEDEEEEFFEEE?ED@EHHIIIIIIIIIIIIIIIIIIIIIIIGIIGGH    NM:i:0    MD:Z:49AS:i:49    XS:i:49

samtools view -h '/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30/SAMD4a_0min_rep2_mem_sorted_Q30_c1.bam' | head -n 30
@HD    VN:1.3    SO:coordinate
@SQ    SN:chrM    LN:16571
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chr10    LN:135534747
@SQ    SN:chr11    LN:135006516
@SQ    SN:chr12    LN:133851895
@SQ    SN:chr13    LN:115169878
@SQ    SN:chr14    LN:107349540
@SQ    SN:chr15    LN:102531392
@SQ    SN:chr16    LN:90354753
@SQ    SN:chr17    LN:81195210
@SQ    SN:chr18    LN:78077248
@SQ    SN:chr19    LN:59128983
@SQ    SN:chr20    LN:63025520
@SQ    SN:chr21    LN:48129895
@SQ    SN:chr22    LN:51304566
@SQ    SN:chrX    LN:155270560
@SQ    SN:chrY    LN:59373566
@PG    ID:bwa    PN:bwa    VN:0.7.12-r1039    CL:bwa mem -t 8 -M -k 19 -r 1 -c 1 /media/trotos/4TB/WORK/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa /media/trotos/4TB/WORK/archive/fastq_files/I15-1087-01-0_SAMD4A_rep2.fastq.trimmed
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2101:14356:88606    0    chrM    1435    251M    *    0    0    AAACTGAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACAC    JJJJJJJDGHHD?FHFGHIIJJHIIIEHIJIIJGIHHHGFFFDDDDDDDDB    NM:i:0    MD:Z:51AS:i:51    XS:i:46    XA:Z:chr11,-10530923,51M,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1110:5813:10384    16    chr1    139942    0    33M    *    0    0    GGTGCGTTCTCCTATGAGAATCTAATGCTGCTG    ?????????????????????????????????    NM:i:0    MD:Z:33    AS:i:33    XS:i:28    XA:Z:chrY,+17045748,28M5S,0;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2207:7870:48784    16    chr1    726108    0    34M    *    0    0    TAATGCAATGGAATGGAATGGAATGGAATGGAAT    GH@GCHF>DHEIGDB?BGFDDD?4DGIHFEFBBH    NM:i:1    MD:Z:0G33    AS:i:33    XS:i:32    XA:Z:chr10,-38777422,34M,1;chr10,+42358545,7S27M,0;

samtools view -h '/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30/SAMD4a_0min_rep2_mem_sorted_Q30_uniq_srt' | head -n 30
[W::sam_read1] parse error at line 1
[main_samview] truncated file.

trotos@stupitMint:~$ samtools view -h '/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30/SAMD4a_0min_rep2_mem_sorted_Q30_uniq_srt.bam' | head -n 30
@HD    VN:1.3    SO:coordinate
@SQ    SN:chrM    LN:16571
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chr10    LN:135534747
@SQ    SN:chr11    LN:135006516
@SQ    SN:chr12    LN:133851895
@SQ    SN:chr13    LN:115169878
@SQ    SN:chr14    LN:107349540
@SQ    SN:chr15    LN:102531392
@SQ    SN:chr16    LN:90354753
@SQ    SN:chr17    LN:81195210
@SQ    SN:chr18    LN:78077248
@SQ    SN:chr19    LN:59128983
@SQ    SN:chr20    LN:63025520
@SQ    SN:chr21    LN:48129895
@SQ    SN:chr22    LN:51304566
@SQ    SN:chrX    LN:155270560
@SQ    SN:chrY    LN:59373566
@PG    ID:bwa    PN:bwa    VN:0.7.12-r1039    CL:bwa mem -t 8 -M -k 19 -r 1 /media/trotos/4TB/WORK/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa /media/trotos/4TB/WORK/archive/fastq_files/I15-1087-01-0_SAMD4A_rep2.fastq.trimmed
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:14169:15550    0    chr1    130304360    45M6S    *    0    0    GTGAACTGGTGAGTGTGAATTAGTGTAAATTCAGTTCAGTGGATCCAAAGA    C1?D:DG?BBFFBDFHGIGIIIIGHIIIIICGHHHIIGEEHEFHHEEEEEE    NM:i:0    MD:Z:45    AS:i:45    XS:i:21
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2206:9020:87155    0    chr1    1303043    60    45M6S    *    0    0    GTGAACTGGTGAGTGTGAATTAGTGTAAATTCAGTTCAGTGGATCCAAAGA    JDHIJJJJJ?FFHGGHIIIIIIJHHIJJIHIJJHHJIIIIJHHHHHFFFFF    NM:i:0    MD:Z:45AS:i:45    XS:i:21
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1107:17925:42658    0    chr1    1841510334M    *    0    0    TTGTTGTTTGTTTTTGAGACAGGGTCTGGCTCTG    JJJJJIGHGIHGIJJHAHIIJIGGH?=EHDEFFF    NM:i:0    MD:Z:34    AS:i:34    XS:i:31
ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 8 hours ago
Seattle, WA, United States

Hi Theodore,

I've changed that limit from 25000 to 100000 in the release and devel versions of IRanges (versions 2.2.3 and 2.3.10, respectively), which will become available via biocLite() in the next 24 hours or so. Hopefully that will be enough for your data set.

Anyway, increasing this hard limit every time the need arises is probably not the right thing to do because then people will start running into stack overflow problems when their data contains high piles of nested intervals like this:

1  -----------------------------------------------------
2    -----------------------------------------------
3       --------------------------------------------
4            --------------------------------
5              --------------------------
...                      etc
99997              --------------
99998                 ----------
99999                 ----------
100000                   ------
100001                    ---

So I guess at some point I'll have to bite the bullet and de-recursify my code, which I've postponed so far because I didn't anticipate real data sets would contain such high piles of nested intervals.

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Do you remove PCR duplicates at your 4C analysis?
 

ADD REPLY
0
Entering edit mode

Hi Theodore,

That's a question to ask to Felix, the author and maintainer of the FourCSeq package. I should have clarified in my previous post that the error you got when calling countFragmentOverlaps() was not happening in the countFragmentOverlaps() function itself (which is defined in the FourCSeq package) but in the countOverlaps() function (which is defined in the IRanges package) that countFragmentOverlaps() calls internally. Since I introduced that limitation (the hard limit of 25000 for the height of a pile of nested intervals) in the findOverlaps/countOverlaps code recently, I chimed in here yesterday to discuss the problem and announce the fix.

Anyway your question about PCR duplicates seems to make a lot of sense to me. Do you typically expect to see a lot of PCR duplicates in a 4C analysis? If yes, that would explain why you have such high piles of nested intervals in your data. Here is the code that countFragmentOverlaps() uses to load the reads:

    reads = lapply(bamFiles, function(bamfile) {
        what <- c("mapq")
        flag <- scanBamFlag(isUnmappedQuery=FALSE)
        param <- ScanBamParam(what=what, flag=flag)
        readGAlignments(bamfile, param=param)
    })

As you can see, it removes unmapped reads only. To also remove PCR duplicates, it would need to prepare the flag param like this:

  flag <- scanBamFlag(isUnmappedQuery=FALSE, isDuplicate=FALSE)

@Felix: Don't know if it would make sense for countFragmentOverlaps() to do this by default. Maybe an argument could be added to countFragmentOverlaps() to let the user control this. What do you think?

H.

ADD REPLY
0
Entering edit mode

Hello Theodore and Hervé,

in the 4C protocol you cannot remove PCR duplicates because this basically would remove your complete signal, which is PCR amplified restriction fragment ends. Therefore in deep sequencing runs many reads (>10e+6)  will fall to exactly the same restriction fragment ends of the viewpoint fragment. It would therefore be good if such cases could be handled by the GRanges framework.

Best regards,

Felix

ADD REPLY
0
Entering edit mode

OK. Looks like we have a valid use case for handling these huge piles of nested intervals. Since the piles are mostly made of duplicated ranges, there is maybe an opportunity to tweak the findOverlaps/countOverlaps code to handle that particular situation in an efficient way. This would be an alternative to trying to support arbitrary huge piles of nested intervals. I'll investigate this. Thanks Felix for the feedback.

H.

ADD REPLY
0
Entering edit mode

hello all,

Sorry for being able to respond earlier, since I still am on vacation!

as correctly stated by Fellix removing  PCR dublicates for 4C is not an option.
Sorry for not being able to provide more information, apart from the testing of the script
By the way since today there is no update on the script. Any estimate on delivering a working script anytime soon?

T

ADD REPLY
1
Entering edit mode

Hi Theodore,

Limit was changed from 25000 to 100000 in versions 2.2.3 (release) and 2.3.10 (devel) of the IRanges package. Both versions are available since yesterday via biocLite(), so please make sure you update your installation with

library(BiocInstaller)
biocLite()

Let us know how it goes. If the new limit of 100000 is still not enough, I'll come up with something else.

Also I realize now that you'll get a warning (pretty obscure for the end-user I must admit, it will say something about the special meaning of maxgap) if you try to specify a non-zero shift when calling countFragmentOverlaps(). Again, this warning is not coming from Felix's code, but from the new findOverlaps implementation (based on Nested Containment Lists) in IRanges/GenomicRanges. That's also something I need to work on.

Sorry for these inconveniences.

H.

ADD REPLY
1
Entering edit mode
Hello Theodore, one quick fix would be to check how many reads you expect for the viewpoint fragment (eg. in igv, ...) and then split up the initial bam files so that the number of counts stay below the threshold Hervé mentioned. After counting and creating the count matrix you can sum up the columns that belong to the large bam file. Cheers, Felix
ADD REPLY
0
Entering edit mode

Hi all,

 

> source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.2 (BiocInstaller 1.19.5), ?biocLite for help
>
> library(BiocInstaller)
> biocLite()
BioC_mirror: http://bioconductor.org
Using Bioconductor 3.2 (BiocInstaller 1.19.5), R 3.2.0 (2015-04-16).
Old packages: 'AnnotationDbi', 'BiocParallel', 'GenomeInfoDb',
  'GenomicFeatures', 'IRanges', 'Matrix', 'OrganismDbi', 'Rsamtools'

 

Still the update for the FourCSeq is still not available, maybe I should try another mirror?

 

ADD REPLY
0
Entering edit mode

Sorry, my mistake it was on IRanges, not on FourCSeq package the changes. when I get home I will try it and sent a full report

ADD REPLY
0
Entering edit mode

I can count appr. 11000 reads at the viewpoint

ADD REPLY
0
Entering edit mode

Hi Felix, 

I know this is an older thread, but I am also running into the same error even when the limit is 100000:

>fc.count = countFragmentOverlaps(fc, minMapq=30)
reading bam files
calculating overlaps
Error in .Call2("NCList_find_overlaps_in_groups", start(q), end(q), q_space,  : 
  extend_stack: cannot build an NCList object of depth >= 100000

You mentioned splitting up the initial bam files to lower the count numbers below the threshold of 100000 -- could you explain a little more on how you would do that? Would you treat the split bam files as individual replicates when counting the fragment overlaps? 

If the problematic nested intervals are all in the viewpoint fragment, can I theoretically remove only those reads or would it problematic when doing the normalization and Z-score model fitting?

Thanks for your help!

Fides

ADD REPLY
0
Entering edit mode

use the devel bioconductor version, it has been extended to process more than 100000 reads

ADD REPLY
0
Entering edit mode

No need to use the devel version of Bioconductor. The 100000 limit has been removed before the spring release (BioC 3.3). So updating to the current release (which requires R 3.3) should do the trick.

Cheers,

H.

ADD REPLY
0
Entering edit mode

So, I'm running into this error with my data when trying to use ChIPQC: extend_stack: cannot build an NCList object of depth >= 100000. I'm not sure what about the code or my data would be causing such a degree of nesting. This data is 100-bp ChIP-seq reads

Traceback:

20: .Call(.NAME, ..., PACKAGE = PACKAGE)
19: .Call2("NCList_build", ans, x_start, x_end, x_subset, PACKAGE = "IRanges")
18: .NCList_xp(x_start, x_end, x_subset)
17: .nclist(x_start, x_end, x_subset = group)
16: FUN(X[[i]], ...)
15: lapply(as.list(X), FUN, ...)
14: lapply(as.list(X), FUN, ...)
13: lapply(x_groups, function(group) .nclist(x_start, x_end, x_subset = group))
12: lapply(x_groups, function(group) .nclist(x_start, x_end, x_subset = group))
11: IRanges:::.nclists(x_ranges, x_groups)
10: GNCList(GRanges(seqnames = seqnames(temp), ranges = ranges(temp),
        strand = strand(temp), elementMetadata(temp)))
9: sampleQC(bamFile = reads, bedFile = peaks, GeneAnnotation = annotation,
       blklist = blacklist, ChrOfInterest = chromosomes, Window = profileWin,
       FragmentLength = fragmentLength, shiftWindowStart = min(shifts),
       shiftWindowEnd = max(shifts), runCrossCor = runCrossCor)
8: ChIPQCsample(reads = sample$bam, peaks = sample$peaks, chromosomes = chromosomes,
       annotation = annotation, mapQCth = mapQCth, blacklist = blacklist,
       profileWin = profileWin, fragmentLength = fragmentLength,
       shifts = shifts)
7: FUN(X[[i]], ...)
6: lapply(X, FUN, ...)
5: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
4: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
3: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
       annotation, mapQCth, blacklist, profileWin, fragmentLength,
       shifts)
2: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
       annotation, mapQCth, blacklist, profileWin, fragmentLength,
       shifts)
1: ChIPQC(experiment = chipqc.exp.df[1, ], annotation = "hg19",
       chromosomes = extractSeqlevels("Homo sapiens", "UCSC") %>%
           setdiff("chrM"), profileWin = 600)

If you want to reproduce the error: download this file and then run the code below:

https://www.dropbox.com/s/bu4ss281u2g0zz1/gnclist-fail.RDS?dl=0

temp <- readRDS("gnclist-fail.RDS")
Sample_GIT <- GNCList(GRanges(seqnames = seqnames(temp),
                              ranges = ranges(temp), strand = strand(temp),
                              elementMetadata(temp)))
ADD REPLY
0
Entering edit mode

Hi Ryan,

Will look into this. Thanks for providing the data.

H.

ADD REPLY
0
Entering edit mode

Hi Ryan,

You have a big pile of 100907 identical alignments at chr1:145277389-145277488. You can see it with:

gal <- readRDS("gnclist-fail.RDS")
gal[start(gal) == 145277389 & end(gal) == 145277488]  # 100907 reads

In IRanges 2.5.43 there is no more limit on the max depth of on-the-fly NCList objects (on-the-fly NCList objects are automatically created by findOverlaps() and countOverlaps() as needed). The limit remains (and is still 100000) for explicit NCList objects i.e. when the user explicitly calls NCList() or GNCList() for upfront preprocessing. However please note that, for a one-time use, upfront preprocessing is not advised (see ?NCList). So with IRanges 2.5.43:

query <- IRanges(1:500000, width=1)
subject <- IRanges(rep(50, 200000), width=1)
NCList(subject)               # still fails
findOverlaps(query, subject)  # works

The reason the limit remains for explicit NCList objects is because they use a different internal representation than on-the-fly NCList objects (an explicit NCList object is stored in an ordinary integer vector which makes it serializable), so the C code to walk on them is also different. The code for walking on explicit NCList objects would need to be modified in the same manner I modified the code for walking on on-the-fly NCList objects. Which I'll do at some point...

Cheers,

H.

ADD REPLY
0
Entering edit mode

Dear Hervé

I face the same problem with another R package (MEDIPS). I get the same error because of the limit set to 100000. Is it something that I can fix by myself in the R code? Where?

Thanks,
Moreno

ADD REPLY
0
Entering edit mode
Theo ▴ 10
@theodoregeorgomanolis-7993
Last seen 5 months ago
Germany

There you go, I think that it solved the major error, there are a few warnings that need to be checked (I have not the skills to interpret them.
Please share your opinion, I will work my way to the rest of the walkthrough and let you know if it worked.

 

> bamFilePath<-"/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30"
> primerFile<-"/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30/primers"
> metaData19 <- list(projectPath = "/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30", fragmentDir = "/media/trotos/4TB/WORK/archive/fastq_files/bwa_mem_Q30", referenceGenomeFile=referenceGenomeFile19, reSequence1 = "RAATTY",reSequence2 = "GATC",primerFile = primerFile,bamFilePath = bamFilePath)
> colData <- DataFrame(viewpoint = "Smad4a",condition = factor(rep(c("Q30","Q30_c1","Q30_uniq_srt"),each=2),levels = c("Q30","Q30_c1","Q30_uniq_srt")), replicate=rep(c(1,2),3), bamFile = c("SAMD4a_0min_rep2_mem_sorted_Q30.bam","SAMD4a_0min_rep3_mem_sorted_Q30.bam","SAMD4a_0min_rep2_mem_sorted_Q30_c1.bam","SAMD4a_0min_rep3_mem_sorted_Q30_c1.bam","SAMD4a_0min_rep2_mem_sorted_Q30_uniq_srt.bam","SAMD4a_0min_rep3_mem_sorted_Q30_uniq_srt.bam"),sequencingPrimer="first")
> fc<-FourC(colData,metaData19)
>
> fc<-addFragments(fc)
> rowRanges(fc)
GRanges object with 2935389 ranges and 4 metadata columns:
            seqnames               ranges strand   |  leftSize rightSize
               <Rle>            <IRanges>  <Rle>   | <numeric> <numeric>
        [1]     chrM         [   1,  288]      *   |         0       284
        [2]     chrM         [ 403, 2051]      *   |       339       819
        [3]     chrM         [2162, 2804]      *   |       189       450
        [4]     chrM         [2811, 3111]      *   |        87        43
        [5]     chrM         [3390, 4121]      *   |       270       424
        ...      ...                  ...    ... ...       ...       ...
  [2935385]     chrY [59351339, 59352173]      *   |       440       253
  [2935386]     chrY [59352180, 59353908]      *   |       110      1403
  [2935387]     chrY [59353915, 59356216]      *   |       301      1439
  [2935388]     chrY [59356223, 59360938]      *   |       763       211
  [2935389]     chrY [59360996, 59373566]      *   |       570     11997
            leftValid rightValid
            <logical>  <logical>
        [1]         0          1
        [2]         1          1
        [3]         1          1
        [4]         1          1
        [5]         1          1
        ...       ...        ...
  [2935385]         1          1
  [2935386]         1          1
  [2935387]         1          1
  [2935388]         1          1
  [2935389]         1          1
  -------
  seqinfo: 25 sequences from an unspecified genome
> colData(fc18_sam)$chr = "chr14"
Error in colData(fc18_sam)$chr = "chr14" :
  object 'fc18_sam' not found
> colData(fc)$chr = "chr14"
> colData(fc)$start =54565826
> colData(fc)$end =54566537
> traceback()
No traceback available
> debug(countFragmentOverlaps)
> countFragmentOverlaps(fc)
debugging in: countFragmentOverlaps(fc)
debug: {
    stopifnot(class(object) == "FourC")
    if (length(rowRanges(object)) == 0)
        stop("Add fragments before calling 'findViewpointFragments'")
    cat("reading bam files\n")
    bamFiles = file.path(metadata(object)$bamFilePath, colData(object)$bamFile)
    colData(object)$originalReads = sapply(bamFiles, function(bamFile) countBam(bamFile)$records)
    reads = lapply(bamFiles, function(bamfile) {
        what <- c("mapq")
        flag <- scanBamFlag(isUnmappedQuery = FALSE)
        param <- ScanBamParam(what = what, flag = flag)
        readGAlignments(bamfile, param = param)
    })
    colData(object)$rawReads = sapply(reads, length)
    if (minMapq >= 0) {
        reads = lapply(reads, function(rga, minMapq) {
            if (!any(is.na(mcols(rga)$mapq))) {
                return(rga[mcols(rga)$mapq > minMapq])
            }
            warning("mapq quality filter could not be applied due to NA values in the mapq quality scores. All reads are used for counting overlaps.")
            rga
        }, minMapq = minMapq)
        colData(object)$lowQualityReads = colData(object)$rawReads -
            sapply(reads, length)
    }
    reads = lapply(reads, granges)
    if (trim > 0) {
        reads = lapply(reads, function(gr, trim) {
            start(gr[strand(gr) == "+"]) = start(gr[strand(gr) ==
                "+"]) + trim
            end(gr[strand(gr) == "-"]) = end(gr[strand(gr) ==
                "-"]) - trim
            gr
        }, trim = trim)
    }
    cat("calculating overlaps\n")
    frag <- rowRanges(object)
    strand(frag) <- "+"
    countsLeftFragmentEnd <- sapply(reads, countOverlaps, query = frag,
        type = c("start"), maxgap = shift)
    strand(frag) <- "-"
    countsRightFragmentEnd <- sapply(reads, countOverlaps, query = frag,
        type = c("end"), maxgap = shift)
    colData(object)$mappedReads = apply(countsLeftFragmentEnd,
        2, sum) + apply(countsRightFragmentEnd, 2, sum)
    colData(object)$mappingRatio = colData(object)$mappedReads/(colData(object)$rawReads -
        colData(object)$lowQualityReads)
    metaDataFrame <- DataFrame(type = rep("countInfo", 5), description = rep("",
        5))
    idx <- colnames(colData(object)) %in% c("originalReads",
        "rawReads", "lowQualityReads", "mappedReads", "mappingRatio")
    mcols(colData(object))[idx, ] <- metaDataFrame
    assays(object) <- SimpleList(countsLeftFragmentEnd = countsLeftFragmentEnd,
        countsRightFragmentEnd = countsRightFragmentEnd)
    assays(object) <- SimpleList(countsLeftFragmentEnd = countsLeftFragmentEnd,
        countsRightFragmentEnd = countsRightFragmentEnd)
    object
}
Browse[2]>
debug: stopifnot(class(object) == "FourC")
Browse[2]>
debug: if (length(rowRanges(object)) == 0) stop("Add fragments before calling 'findViewpointFragments'")
Browse[2]>
debug: cat("reading bam files\n")
Browse[2]>
reading bam files
debug: bamFiles = file.path(metadata(object)$bamFilePath, colData(object)$bamFile)
Browse[2]>
debug: colData(object)$originalReads = sapply(bamFiles, function(bamFile) countBam(bamFile)$records)
Browse[2]>


debug: reads = lapply(bamFiles, function(bamfile) {
    what <- c("mapq")
    flag <- scanBamFlag(isUnmappedQuery = FALSE)
    param <- ScanBamParam(what = what, flag = flag)
    readGAlignments(bamfile, param = param)
})
Browse[2]>


debug: colData(object)$rawReads = sapply(reads, length)
Browse[2]>
debug: if (minMapq >= 0) {
    reads = lapply(reads, function(rga, minMapq) {
        if (!any(is.na(mcols(rga)$mapq))) {
            return(rga[mcols(rga)$mapq > minMapq])
        }
        warning("mapq quality filter could not be applied due to NA values in the mapq quality scores. All reads are used for counting overlaps.")
        rga
    }, minMapq = minMapq)
    colData(object)$lowQualityReads = colData(object)$rawReads -
        sapply(reads, length)
}
Browse[2]>
debug: reads = lapply(reads, function(rga, minMapq) {
    if (!any(is.na(mcols(rga)$mapq))) {
        return(rga[mcols(rga)$mapq > minMapq])
    }
    warning("mapq quality filter could not be applied due to NA values in the mapq quality scores. All reads are used for counting overlaps.")
    rga
}, minMapq = minMapq)
Browse[2]>
debug: colData(object)$lowQualityReads = colData(object)$rawReads -
    sapply(reads, length)
Browse[2]>
debug: reads = lapply(reads, granges)
Browse[2]>
debug: if (trim > 0) {
    reads = lapply(reads, function(gr, trim) {
        start(gr[strand(gr) == "+"]) = start(gr[strand(gr) ==
            "+"]) + trim
        end(gr[strand(gr) == "-"]) = end(gr[strand(gr) == "-"]) -
            trim
        gr
    }, trim = trim)
}
Browse[2]>
debug: cat("calculating overlaps\n")
Browse[2]>
calculating overlaps
debug: frag <- rowRanges(object)
Browse[2]>
debug: strand(frag) <- "+"
Browse[2]>
debug: countsLeftFragmentEnd <- sapply(reads, countOverlaps, query = frag,
    type = c("start"), maxgap = shift)
Browse[2]>
debug: strand(frag) <- "-"
Browse[2]>
debug: countsRightFragmentEnd <- sapply(reads, countOverlaps, query = frag,
    type = c("end"), maxgap = shift)
Browse[2]>
debug: colData(object)$mappedReads = apply(countsLeftFragmentEnd, 2,
    sum) + apply(countsRightFragmentEnd, 2, sum)
Browse[2]>
debug: colData(object)$mappingRatio = colData(object)$mappedReads/(colData(object)$rawReads -
    colData(object)$lowQualityReads)
Browse[2]>
debug: metaDataFrame <- DataFrame(type = rep("countInfo", 5), description = rep("",
    5))
Browse[2]>
debug: idx <- colnames(colData(object)) %in% c("originalReads", "rawReads",
    "lowQualityReads", "mappedReads", "mappingRatio")
Browse[2]>
debug: mcols(colData(object))[idx, ] <- metaDataFrame
Browse[2]>
debug: assays(object) <- SimpleList(countsLeftFragmentEnd = countsLeftFragmentEnd,
    countsRightFragmentEnd = countsRightFragmentEnd)
Browse[2]>
debug: assays(object) <- SimpleList(countsLeftFragmentEnd = countsLeftFragmentEnd,
    countsRightFragmentEnd = countsRightFragmentEnd)
Browse[2]>
debug: object
Browse[2]>
exiting from: countFragmentOverlaps(fc)
class: FourC
dim: 2935389 6
metadata(7): projectPath fragmentDir ... primerFile bamFilePath
assays(2): countsLeftFragmentEnd countsRightFragmentEnd
rownames: NULL
rowRanges metadata column names(4): leftSize rightSize leftValid
  rightValid
colnames(6): Smad4a_Q30_1 Smad4a_Q30_2 ... Smad4a_Q30_uniq_srt_1
  Smad4a_Q30_uniq_srt_2
colData names(13): viewpoint condition ... mappedReads mappingRatio
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
2: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
3: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
4: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
5: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
6: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
7: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
8: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
9: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
10: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
11: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
12: In countOverlaps_GenomicRanges(query, subject, maxgap = maxgap,  ... :
  With the new findOverlaps implementation based on Nested Containment
  Lists, 'maxgap' has no special meaning when 'type' is "start", "end",
  "within", or "equal". Note that this is a change with respect to the
  old findOverlaps implementation based on Interval Trees. See ?NCList
  for more information about this change and other differences between
  the new and old algorithms.
>

 

ADD COMMENT
0
Entering edit mode
Theo ▴ 10
@theodoregeorgomanolis-7993
Last seen 5 months ago
Germany

Dear Felix, Herve

Although your hack on iRanges seems to fix the fragment counting and the combinefragends
I stepped into the following while trying to plot:

> fc<-smoothCounts(fc)
5 chrM
5 chr1
5 chr2
5 chr3
5 chr4
5 chr5
5 chr6
5 chr7
5 chr8
5 chr9
5 chr10
5 chr11
5 chr12
5 chr13
5 chr14
5 chr15
5 chr16
5 chr17
5 chr18
5 chr19
5 chr20
5 chr21
5 chr22
5 chrX
5 chrY
> fc
class: FourC
dim: 2935389 3
metadata(7): projectPath fragmentDir ... primerFile bamFilePath
assays(4): counts countsLeftFragmentEnd countsRightFragmentEnd counts_5
rownames: NULL
rowRanges metadata column names(4): leftSize rightSize leftValid
  rightValid
colnames(3): Smad4a_rep2_aln_1 Smad4a_rep2_bow_1 Smad4a_mem_sorted_1
colData names(13): viewpoint condition ... mappedReads mappingRatio
> plotScatter(fc[,c("rep2_aln","rep2_bow")],xlab="rep2_aln",ylab="rep2_bow",asp=1)
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) :
  <FourC>[,j] index out of bounds: rep2_aln rep2_bow

Maybe it has to do with the warnings I encounter while countfragmentoverlaps

 

Any recommendations for the new error?

ADD COMMENT
1
Entering edit mode

Hi Theodore,

It seems that changing the limit from 25000 to 100000 for the maximum depth of an NCList object did the trick for your data. You can safely ignore the warnings about the special meaning of maxgap because you're using shift=0 (the default) when calling countFragmentOverlaps(). Anyway, I've just restored the special meaning of maxgap in IRanges and GenomicRanges when calling findOverlaps() or countOverlaps() with type="start" or type="end", so this warning should go away when you update your installation in about 24 hours.

It looks to me that the error you get while trying to plot is because you're using invalid column names when subsetting fc. Please try again with valid column names:

fc[ , c("Smad4a_rep2_aln_1", "Smad4a_rep2_bow_1")]  # call plotScatter()
                                                    # on this

Cheers,

H.

ADD REPLY
0
Entering edit mode
@ericaenjoy3-8105
Last seen 9.5 years ago
United States

Hi, I would recommend using "0L" instead of "0" for the "shift" parameter for the "countFragmentOverlaps" function. Give it a try?

ADD COMMENT
0
Entering edit mode

Could you please elaborate?  "0L" is not an integer.

ADD REPLY
1
Entering edit mode
class(0)
# [1] "numeric"
class(0L)
# [1] "integer"

 

ADD REPLY
0
Entering edit mode

weird R,

thank you, I will check it
EDIT: it produced no errors with the 0L usage!!!

ADD REPLY

Login before adding your answer.

Traffic: 597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6