readGAlignmentPairsFromBam with yieldSize & which
1
0
Entering edit mode
@stefano-calza-5062
Last seen 10.2 years ago
Italy
Hi! I am experiencing this weird (to me...) behaviour: bfl = BamFile(some/bam/file,yieldSize=100000L) param=ScanBamParam(which=RangesList(chr22=IRanges(1,51304566))) ## 1 readGAlignmentsFromBam(bfl) ## 2 readGAlignmentsFromBam(bfl,param=param) Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at least a couple of minutes till I killed it... Am I doing something wrong or missing something here? Stefano
• 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 06/17/2014 05:39 AM, Stefano Calza wrote: > Hi! > > I am experiencing this weird (to me...) behaviour: > > bfl = BamFile(some/bam/file,yieldSize=100000L) > > param=ScanBamParam(which=RangesList(chr22=IRanges(1,51304566))) > > ## 1 > readGAlignmentsFromBam(bfl) > > ## 2 > > readGAlignmentsFromBam(bfl,param=param) > > > Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at least a > couple of minutes till I killed it... > > Am I doing something wrong or missing something here? Hi Stefano -- Does the description on ?BamFile help? yieldSize: Number of records to yield each time the file is read from using 'scanBam' or, when 'length(bamWhich()) != 0', a threshold which yields records in complete ranges whose sum first exceeds 'yieldSize'. you're getting the complete range chr22=IRanges(1,51304566), because that will be the first complete range for which yieldSize is satsified. The current implementation allows you to iterate through an entire file, or iterate through many small ranges, but it does not really work well if the goal is to subset the bam file (e.g., by chromsome) and iterate through the subset; you would then filterBam and iterate through that. It could also be that there is a bug in the code; it would then help to have the output of sessionInfo() and a reproducible example, e.g., a subset of some/bam/file or use of the files in the experiment data package RNAseqData.HNRNPC.bam.chr14 Martin > > Stefano > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Thanks Martin, I didn't quite understand that sentence: now it clear. My hope was indeed to be able to iterate by yieldSize within a prespecified range, without writing a new file to the disk (like filterBAM does...right?) Thanks Stefano On 06/17/2014 06:31 PM, Martin Morgan wrote: > On 06/17/2014 05:39 AM, Stefano Calza wrote: >> Hi! >> >> I am experiencing this weird (to me...) behaviour: >> >> bfl = BamFile(some/bam/file,yieldSize=100000L) >> >> param=ScanBamParam(which=RangesList(chr22=IRanges(1,51304566))) >> >> ## 1 >> readGAlignmentsFromBam(bfl) >> >> ## 2 >> >> readGAlignmentsFromBam(bfl,param=param) >> >> >> Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at >> least a >> couple of minutes till I killed it... >> >> Am I doing something wrong or missing something here? > > Hi Stefano -- > > Does the description on ?BamFile help? > > yieldSize: Number of records to yield each time the file is read > from using 'scanBam' or, when 'length(bamWhich()) != 0', a > threshold which yields records in complete ranges whose sum > first exceeds 'yieldSize'. > > you're getting the complete range chr22=IRanges(1,51304566), because > that will be the first complete range for which yieldSize is satsified. > > The current implementation allows you to iterate through an entire > file, or iterate through many small ranges, but it does not really > work well if the goal is to subset the bam file (e.g., by chromsome) > and iterate through the subset; you would then filterBam and iterate > through that. > > It could also be that there is a bug in the code; it would then help > to have the output of sessionInfo() and a reproducible example, e.g., > a subset of some/bam/file or use of the files in the experiment data > package RNAseqData.HNRNPC.bam.chr14 > > Martin > > >> >> Stefano >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6