FourCseq - error ountFragmentOverlaps()
2
0
Entering edit mode
@samuel-collombet-6574
Last seen 7.6 years ago
France

Hi,

I have an error using FourCseq, which I don't really understand as I was usign the package before without problem.

Here is my code:

> 4C_exptData <- SimpleList(projectPath="myPath",fragmentDir="fragmentDir",referenceGenomeFile ="chr.fa",primerFile="primer.fa",bamFilePath="bam",reSequence1="GTAC",reSequence2="GATC")

> 4C_colData  <- DataFrame(viewpoint="vp",condition=factor(rep(c("cell1", "cell2"),each=2),levels=c("cell1", "cell2")),replicate=c(1,2)),bamFile=c("cell1_rep1.bam","cell1_rep2.bam","cell2_rep1.bam","cell2_rep2.bam"),sequencingPrimer="first")

> 4C_fc <- FourC(4C_colData, 4C_exptData)
> 4C_fc <- addFragments(4C_fc,save=F)
> 4C_fc <- addViewpointFrags(4C_fc, primerFragFile="fragmentDir/primerFragments.rda")
> 4C_fc <- countFragmentOverlaps(4C_fc)

And here I got :

reading bam files
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs

Here is my sessionInfo:

> sessionInfo()
R version 3.1.0 RC (2014-04-05 r65382)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.0.0
 [2] GenomicFeatures_1.18.3                  
 [3] AnnotationDbi_1.28.1                    
 [4] Biobase_2.26.0                          
 [5] FourCSeq_1.0.0                          
 [6] DESeq2_1.6.3                            
 [7] RcppArmadillo_0.4.650.1.1               
 [8] Rcpp_0.11.4                             
 [9] ggplot2_1.0.0                           
[10] GenomicRanges_1.18.4                    
[11] GenomeInfoDb_1.2.4                      
[12] IRanges_2.0.1                           
[13] S4Vectors_0.4.0                         
[14] BiocGenerics_0.12.1                     

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3          annotate_1.44.0          base64enc_0.1-2         
 [4] BatchJobs_1.5            BBmisc_1.9               BiocParallel_1.0.3      
 [7] biomaRt_2.22.0           Biostrings_2.34.1        biovizBase_1.14.1       
[10] bitops_1.0-6             brew_1.0-6               BSgenome_1.34.1         
[13] checkmate_1.5.1          cluster_2.0.1            codetools_0.2-10        
[16] colorspace_1.2-5         DBI_0.3.1                dichromat_2.0-0         
[19] digest_0.6.8             fail_1.2                 fda_2.4.4               
[22] foreach_1.4.2            foreign_0.8-63           Formula_1.2-0           
[25] genefilter_1.48.1        geneplotter_1.44.0       GenomicAlignments_1.2.2 
[28] GGally_0.5.0             ggbio_1.14.0             graph_1.44.1            
[31] grid_3.1.0               gridExtra_0.9.1          gtable_0.1.2            
[34] gtools_3.4.1             Hmisc_3.15-0             iterators_1.0.7         
[37] lattice_0.20-30          latticeExtra_0.6-26      locfit_1.5-9.1          
[40] MASS_7.3-39              Matrix_1.1-5             munsell_0.4.2           
[43] nnet_7.3-9               OrganismDbi_1.8.0        plyr_1.8.1              
[46] proto_0.3-10             RBGL_1.42.0              RColorBrewer_1.1-2      
[49] RCurl_1.95-4.5           reshape_0.8.5            reshape2_1.4.1          
[52] rpart_4.1-9              Rsamtools_1.18.3         RSQLite_1.0.0           
[55] rtracklayer_1.26.2       scales_0.2.4             sendmailR_1.2-1         
[58] stringr_0.6.2            survival_2.38-1          tcltk_3.1.0             
[61] tools_3.1.0              VariantAnnotation_1.12.9 XML_3.98-1.1            
[64] xtable_1.7-4             XVector_0.6.0            zlibbioc_1.12.0   

 

fourcseq • 2.2k views
ADD COMMENT
0
Entering edit mode
felix.klein ▴ 150
@felixklein-6900
Last seen 6.5 years ago
Germany
Hello Samuel, the BiocTeam did some changes to the countOverlaps function. Could you please run traceback() to figure out where the error occurs exactly? I will look into this tomorrow in more detail. Cheers, Felix On 03/04/2015 11:12 PM, samuel collombet [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User samuel collombet <https: support.bioconductor.org="" u="" 6574=""/> wrote > Question: FourCseq - error ountFragmentOverlaps() > <https: support.bioconductor.org="" p="" 65390=""/>: > > Hi, > > I have an error using FourCseq, which I don't really understand as I > was usign the package before without problem. > > Here is my code: > > > 4C_exptData <- > SimpleList(projectPath="myPath",fragmentDir="fragmentDir",referenceGenomeFile > ="chr.fa",primerFile="primer.fa",bamFilePath="bam",reSequence1="GTAC",reSequence2="GATC") > > > 4C_colData <- > DataFrame(viewpoint="vp",condition=factor(rep(c("cell1", > "cell2"),each=2),levels=c("cell1", > "cell2")),replicate=c(1,2)),bamFile=c("cell1_rep1.bam","cell1_rep2.bam","cell2_rep1.bam","cell2_rep2.bam"),sequencingPrimer="first") > > > 4C_fc <- FourC(4C_colData, 4C_exptData) > > 4C_fc <- addFragments(4C_fc,save=F) > > 4C_fc <- addViewpointFrags(4C_fc, primerFragFile="fragmentDir/primerFragments.rda") > > 4C_fc <- countFragmentOverlaps(4C_fc) > > And here I got : > > reading bam files > Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : > subscript contains NAs > > Here is my sessionInfo: > > > sessionInfo() > R version 3.1.0 RC (2014-04-05 r65382) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats4 parallel stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.0.0 > [2] GenomicFeatures_1.18.3 > [3] AnnotationDbi_1.28.1 > [4] Biobase_2.26.0 > [5] FourCSeq_1.0.0 > [6] DESeq2_1.6.3 > [7] RcppArmadillo_0.4.650.1.1 > [8] Rcpp_0.11.4 > [9] ggplot2_1.0.0 > [10] GenomicRanges_1.18.4 > [11] GenomeInfoDb_1.2.4 > [12] IRanges_2.0.1 > [13] S4Vectors_0.4.0 > [14] BiocGenerics_0.12.1 > > loaded via a namespace (and not attached): > [1] acepack_1.3-3.3 annotate_1.44.0 base64enc_0.1-2 > [4] BatchJobs_1.5 BBmisc_1.9 BiocParallel_1.0.3 > [7] biomaRt_2.22.0 Biostrings_2.34.1 biovizBase_1.14.1 > [10] bitops_1.0-6 brew_1.0-6 BSgenome_1.34.1 > [13] checkmate_1.5.1 cluster_2.0.1 codetools_0.2-10 > [16] colorspace_1.2-5 DBI_0.3.1 dichromat_2.0-0 > [19] digest_0.6.8 fail_1.2 fda_2.4.4 > [22] foreach_1.4.2 foreign_0.8-63 Formula_1.2-0 > [25] genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.2 > [28] GGally_0.5.0 ggbio_1.14.0 graph_1.44.1 > [31] grid_3.1.0 gridExtra_0.9.1 gtable_0.1.2 > [34] gtools_3.4.1 Hmisc_3.15-0 iterators_1.0.7 > [37] lattice_0.20-30 latticeExtra_0.6-26 locfit_1.5-9.1 > [40] MASS_7.3-39 Matrix_1.1-5 munsell_0.4.2 > [43] nnet_7.3-9 OrganismDbi_1.8.0 plyr_1.8.1 > [46] proto_0.3-10 RBGL_1.42.0 RColorBrewer_1.1-2 > [49] RCurl_1.95-4.5 reshape_0.8.5 reshape2_1.4.1 > [52] rpart_4.1-9 Rsamtools_1.18.3 RSQLite_1.0.0 > [55] rtracklayer_1.26.2 scales_0.2.4 sendmailR_1.2-1 > [58] stringr_0.6.2 survival_2.38-1 tcltk_3.1.0 > [61] tools_3.1.0 VariantAnnotation_1.12.9 XML_3.98-1.1 > [64] xtable_1.7-4 XVector_0.6.0 zlibbioc_1.12.0 > > ------------------------------------------------------------------------ > > You may reply via email or visit > FourCseq - error ountFragmentOverlaps() >
ADD COMMENT
0
Entering edit mode

Hi Felix,

The changes to the implementation of findOverlaps()/countOverlaps() and family are in devel (BioC 3.1) but it looks like Samuel is using release (BioC 3.0).  The details of these changes were announced on the bioc-devel mailing list in December:

https://stat.ethz.ch/pipermail/bioc-devel/2014-December/006749.html

If you find problems or have questions about these changes, please use bioc-devel.

Thanks,

H.

ADD REPLY
0
Entering edit mode
@samuel-collombet-6574
Last seen 7.6 years ago
France

> traceback()
11: stop("subscript contains NAs")                                                                                                                              
10: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)                                                                                               
9: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)                                                                                                
8: normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)                                                                                                        
7: extractROWS(x, i)                                                                                                                                            
6: extractROWS(x, i)                                                                                                                                            
5: rga[mcols(rga)$mapq > minMapq]                                                                                                                               
4: rga[mcols(rga)$mapq > minMapq]                                                                                                                               
3: FUN(X[[1L]], ...)                                                                                                                                            
2: lapply(reads, function(rga, minMapq) rga[mcols(rga)$mapq > minMapq],                                                                                         
       minMapq = minMapq)                                                                                                                                       
1: countFragmentOverlaps(4C_fc)

is it ok? 
Thanks!

ADD COMMENT
0
Entering edit mode
Hello Samuel, I could reproduce the error by setting the "mapq" to NA. This means that the reads that you are using probably do not have mapq scores. So at this position the error occurs: 5: rga[mcols(rga)$mapq > minMapq] 4: rga[mcols(rga)$mapq > minMapq] Since there are no scores this results in NAs and raises the reported error. Please check what the mapq scores are in your bam files and adjust the filter accordingly. To completely skip the filtering step you can provide a negative number for the minimum mapq score: countFragmentOverlaps(fc, minMapq=-1) This should solve the problem. Best regards, Felix
ADD REPLY
0
Entering edit mode

This is quite strange as all my reads have a MAPQ of 255 (mapping with STAR):
$ samtools view reads.bam | cut -f 5 | sort -u
> 255

However putting minMapq=-1 make it work! Do you have any idea where it comes form?

Many thanks!

 

ADD REPLY
0
Entering edit mode
Hello Samuel, I would guess that somehow the quality scores are not properly read resulting in NA values. If you run countFragmentOverlaps in debugging mode you could check if the mapq are properly read or if they are read as NA. It would be great if you could check this on your files or provide me with a short file (maybe 100 lines of your bam files) that I can test it. Best regards, Felix
ADD REPLY
0
Entering edit mode

sure, but I do not know how to run debugging mode, how should I do?

ADD REPLY
0
Entering edit mode
Hello Samuel, just type: debug(countFragmentOverlaps) countFragmentOverlaps(fc)  you can walk through the lines of the function by pressing enter for each line until the files bam files are read in. Then have a look at reads[[1]]. For the example data in the package it looks like this: GAlignments object with 13 alignments and 1 metadata column: seqnames strand cigar qwidth start end width njunc | mapq <rle> <rle> <character> <integer> <integer> <integer> <integer> <integer> | <integer> [1] chr2L + 80M3S 83 5152 5231 80 0 | 70 [2] chr2L + 71M12S 83 5152 5222 71 0 | 70 [3] chr2L + 78M 78 5152 5229 78 0 | 70 [4] chr2L - 83M 83 5222 5304 83 0 | 70 [5] chr2L - 78M 78 5227 5304 78 0 | 70 ... ... ... ... ... ... ... ... ... ... ... [9] chr2L + 83M 83 6023 6105 83 0 | 70 [10] chr2L + 83M 83 6023 6105 83 0 | 70 [11] chr2L + 83M 83 6023 6105 83 0 | 70 [12] chr2L + 78M 78 6023 6100 78 0 | 70 [13] chr2L - 83M 83 6800 6882 83 0 | 70 ------- seqinfo: 1 sequence from an unspecified genome So the mapq is 70 in this case. Best regards, Felix
ADD REPLY
0
Entering edit mode

I am sorry I do not really understand it, but it looks like I do not get even to reading the bam file :/
 

Browse[2]> 
debug: stopifnot(class(object) == "FourC")
Browse[2]> 
debug: if (length(rowData(object)) == 0) stop("Add fragments before calling 'findViewpointFragments'")
Browse[2]> 
debug: cat("reading bam files\n")
Browse[2]> 
reading bam files
debug: bamFiles = file.path(exptData(object)$bamFilePath, colData(object)$bamFile)
Browse[2]> 
debug: colData(object)$orignialReads = sapply(bamFiles, function(bamFile) countBam(bamFile)$records)
Browse[2]> 

debug: reads = lapply(bamFiles, function(bamfile) {
    what <- c("mapq")
    flag <- scanBamFlag(isUnmappedQuery = FALSE)
    param <- ScanBamParam(what = what, flag = flag)
    readGAlignments(bamfile, param = param)
})
Browse[2]> 

debug: colData(object)$rawReads = sapply(reads, length)

Browse[2]> 
debug: if (minMapq >= 0) reads = lapply(reads, function(rga, minMapq) rga[mcols(rga)$mapq > 
    minMapq], minMapq = minMapq)
Browse[2]> 
debug: reads = lapply(reads, function(rga, minMapq) rga[mcols(rga)$mapq > 
    minMapq], minMapq = minMapq)
Browse[2]> 
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs

Does it helps? otherwise I can send you part of my bam by email

ADD REPLY
0
Entering edit mode
Hello Samuel, you read in the bam files. In the debugging mode you can manually interact at each point. Just before the steps below you should type reads[[1]] manually into the console. > > Browse[2]> > debug:colData(object)$rawReads = sapply(reads, length) > > Browse[2]> > debug: if (minMapq >= 0) reads = lapply(reads, function(rga, minMapq) > rga[mcols(rga)$mapq > > minMapq], minMapq = minMapq) > Cheers, Felix PS: Otherwise send a small example file to felix.klein@embl.de including all information needed to create the FourC object until the countFragmentOverlaps call.
ADD REPLY
0
Entering edit mode

Got it :) 

indeed all the reads are reads with NA as MAPQ :/ I sent you 100 line of sam if you want to check it.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6