base-specific read counts
1
0
Entering edit mode
Yu Chuan Tai ▴ 440
@yu-chuan-tai-1534
Last seen 10.2 years ago
Hi, Is there any way to calculate base-specific read counts for a given genomic interval (including 1-base interval), for paired-end data aligned by Bowtie2 in BAM format? Thanks! Best, Yu Chuan
• 1.9k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: > Hi, > > Is there any way to calculate base-specific read counts for a given > genomic interval (including 1-base interval), for paired-end data > aligned by Bowtie2 in BAM format? Thanks for posting to the Boic mailing list! Functions like readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in turn has an argument 'which' to specify, using GRanges, the regions of a bam file you want to query gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), c("+", "+", "-")) param <- ScanBamParam(which=gwhich) scanBam("my.bam", param=param) Base-level coverage is also available with ?applyPileups, see example(applyPileups). Martin > Thanks! > > Best, > Yu Chuan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Thanks! In your code below, to take care of the paired-end reads, is it correct that at least I need to set isPaired=TRUE in scanBamFlag()? Best, Yu Chuan On Thu, 7 Jun 2012, Martin Morgan wrote: > On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: >> Hi, >> >> Is there any way to calculate base-specific read counts for a given >> genomic interval (including 1-base interval), for paired-end data >> aligned by Bowtie2 in BAM format? > > Thanks for posting to the Boic mailing list! Functions like > readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in > turn has an argument 'which' to specify, using GRanges, the regions of a bam > file you want to query > > gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), > c("+", "+", "-")) > param <- ScanBamParam(which=gwhich) > scanBam("my.bam", param=param) > > Base-level coverage is also available with ?applyPileups, see > example(applyPileups). > > Martin > >> Thanks! >> >> Best, >> Yu Chuan >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >
ADD REPLY
0
Entering edit mode
On Thu, Jun 7, 2012 at 11:03 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: > Thanks! In your code below, to take care of the paired-end reads, is it > correct that at least I need to set isPaired=TRUE in scanBamFlag()? No. The "isXXX" stuff is for filtering the data. Assuming that you want all your reads to be included (and not just paired reads), you do not need to set isPaired. Sean > Best, > Yu Chuan > > On Thu, 7 Jun 2012, Martin Morgan wrote: > >> On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: >>> >>> Hi, >>> >>> Is there any way to calculate base-specific read counts for a given >>> genomic interval (including 1-base interval), for paired-end data >>> aligned by Bowtie2 in BAM format? >> >> >> Thanks for posting to the Boic mailing list! Functions like >> readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in >> turn has an argument 'which' to specify, using GRanges, the regions of a bam >> file you want to query >> >> ?gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), >> ? ? c("+", "+", "-")) >> ?param <- ScanBamParam(which=gwhich) >> ?scanBam("my.bam", param=param) >> >> Base-level coverage is also available with ?applyPileups, see >> example(applyPileups). >> >> Martin >> >>> Thanks! >>> >>> Best, >>> Yu Chuan >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
I see. So, which input arguments of scanBamFlag() or ScanBamParam() take care of paired-end reads? Or should I even worry about the paired-end natural when calculating coverage? Thanks! Yu Chuan On Thu, 7 Jun 2012, Sean Davis wrote: > On Thu, Jun 7, 2012 at 11:03 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: >> Thanks! In your code below, to take care of the paired-end reads, is it >> correct that at least I need to set isPaired=TRUE in scanBamFlag()? > > No. The "isXXX" stuff is for filtering the data. Assuming that you > want all your reads to be included (and not just paired reads), you do > not need to set isPaired. > > Sean > >> Best, >> Yu Chuan >> >> On Thu, 7 Jun 2012, Martin Morgan wrote: >> >>> On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: >>>> >>>> Hi, >>>> >>>> Is there any way to calculate base-specific read counts for a given >>>> genomic interval (including 1-base interval), for paired-end data >>>> aligned by Bowtie2 in BAM format? >>> >>> >>> Thanks for posting to the Boic mailing list! Functions like >>> readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in >>> turn has an argument 'which' to specify, using GRanges, the regions of a bam >>> file you want to query >>> >>> ?gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), >>> ? ? c("+", "+", "-")) >>> ?param <- ScanBamParam(which=gwhich) >>> ?scanBam("my.bam", param=param) >>> >>> Base-level coverage is also available with ?applyPileups, see >>> example(applyPileups). >>> >>> Martin >>> >>>> Thanks! >>>> >>>> Best, >>>> Yu Chuan >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> -- >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
On Thu, Jun 7, 2012 at 11:24 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: > I see. So, which input arguments of scanBamFlag() or ScanBamParam() take > care of paired-end reads? Or should I even worry about the paired- end > natural when calculating coverage? There are situations when you want to calculate coverage based on the extreme ends of pairs, but that would be for finding structural variants and the like and not for determining base-level coverage. You do not need to do anything special to read in paired-end data. As Martin mentioned, there is more complete handling of paired-end data in development, but that is really orthogonal to the scanBamParam() isPaired flag. Sean > > Thanks! > Yu Chuan > > On Thu, 7 Jun 2012, Sean Davis wrote: > >> On Thu, Jun 7, 2012 at 11:03 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: >>> Thanks! In your code below, to take care of the paired-end reads, is it >>> correct that at least I need to set isPaired=TRUE in scanBamFlag()? >> >> No. ?The "isXXX" stuff is for filtering the data. ?Assuming that you >> want all your reads to be included (and not just paired reads), you do >> not need to set isPaired. >> >> Sean >> >>> Best, >>> Yu Chuan >>> >>> On Thu, 7 Jun 2012, Martin Morgan wrote: >>> >>>> On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: >>>>> >>>>> Hi, >>>>> >>>>> Is there any way to calculate base-specific read counts for a given >>>>> genomic interval (including 1-base interval), for paired-end data >>>>> aligned by Bowtie2 in BAM format? >>>> >>>> >>>> Thanks for posting to the Boic mailing list! Functions like >>>> readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in >>>> turn has an argument 'which' to specify, using GRanges, the regions of a bam >>>> file you want to query >>>> >>>> ?gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), >>>> ? ? c("+", "+", "-")) >>>> ?param <- ScanBamParam(which=gwhich) >>>> ?scanBam("my.bam", param=param) >>>> >>>> Base-level coverage is also available with ?applyPileups, see >>>> example(applyPileups). >>>> >>>> Martin >>>> >>>>> Thanks! >>>>> >>>>> Best, >>>>> Yu Chuan >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >>>> -- >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>
ADD REPLY
0
Entering edit mode
Hi Sean, I see. Thanks for your clarification! Best, Yu Chuan On Thu, 7 Jun 2012, Sean Davis wrote: > On Thu, Jun 7, 2012 at 11:24 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: >> I see. So, which input arguments of scanBamFlag() or ScanBamParam() take >> care of paired-end reads? Or should I even worry about the paired- end >> natural when calculating coverage? > > There are situations when you want to calculate coverage based on the > extreme ends of pairs, but that would be for finding structural > variants and the like and not for determining base-level coverage. > You do not need to do anything special to read in paired-end data. As > Martin mentioned, there is more complete handling of paired-end data > in development, but that is really orthogonal to the scanBamParam() > isPaired flag. > > Sean > >> >> Thanks! >> Yu Chuan >> >> On Thu, 7 Jun 2012, Sean Davis wrote: >> >>> On Thu, Jun 7, 2012 at 11:03 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: >>>> Thanks! In your code below, to take care of the paired-end reads, is it >>>> correct that at least I need to set isPaired=TRUE in scanBamFlag()? >>> >>> No. ?The "isXXX" stuff is for filtering the data. ?Assuming that you >>> want all your reads to be included (and not just paired reads), you do >>> not need to set isPaired. >>> >>> Sean >>> >>>> Best, >>>> Yu Chuan >>>> >>>> On Thu, 7 Jun 2012, Martin Morgan wrote: >>>> >>>>> On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Is there any way to calculate base-specific read counts for a given >>>>>> genomic interval (including 1-base interval), for paired-end data >>>>>> aligned by Bowtie2 in BAM format? >>>>> >>>>> >>>>> Thanks for posting to the Boic mailing list! Functions like >>>>> readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in >>>>> turn has an argument 'which' to specify, using GRanges, the regions of a bam >>>>> file you want to query >>>>> >>>>> ?gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), >>>>> ? ? c("+", "+", "-")) >>>>> ?param <- ScanBamParam(which=gwhich) >>>>> ?scanBam("my.bam", param=param) >>>>> >>>>> Base-level coverage is also available with ?applyPileups, see >>>>> example(applyPileups). >>>>> >>>>> Martin >>>>> >>>>>> Thanks! >>>>>> >>>>>> Best, >>>>>> Yu Chuan >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> >>>>> -- >>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N. >>>>> PO Box 19024 Seattle, WA 98109 >>>>> >>>>> Location: Arnold Building M1 B861 >>>>> Phone: (206) 667-2793 >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >
ADD REPLY
0
Entering edit mode
Hi Martin, One more question. Is there any way in Rsamtools to calculate SNVs/INDELS frequency directly using the output file from samtools? Thanks! Best, Yu Chuan On Thu, 7 Jun 2012, Martin Morgan wrote: > On 06/06/2012 10:43 PM, Yu Chuan Tai wrote: >> Hi, >> >> Is there any way to calculate base-specific read counts for a given >> genomic interval (including 1-base interval), for paired-end data >> aligned by Bowtie2 in BAM format? > > Thanks for posting to the Boic mailing list! Functions like > readGappedAlignments, scanBam, etc. take an argument ScanBamParam that in > turn has an argument 'which' to specify, using GRanges, the regions of a bam > file you want to query > > gwhich <- GRanges("chr1", IRanges(c(1000, 2000, 3000), width=100)), > c("+", "+", "-")) > param <- ScanBamParam(which=gwhich) > scanBam("my.bam", param=param) > > Base-level coverage is also available with ?applyPileups, see > example(applyPileups). > > Martin > >> Thanks! >> >> Best, >> Yu Chuan >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >
ADD REPLY
0
Entering edit mode
On Fri, Jun 8, 2012 at 2:06 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: > Hi Martin, > > One more question. Is there any way in Rsamtools to calculate SNVs/INDELS > frequency directly using the output file from samtools? By "output file from samtools", I assume you mean a VCF file. If so, take a look a the VariantAnnotation package and readVcf(). From there, you'll need to do the calculation yourself, but that would be a step on the way to accomplishing your task. Sean
ADD REPLY
0
Entering edit mode
Hi Sean, I didn't find any function in the VariantAnnotation package that can calculate mutant freq. Do you mean after reading in a VCF file using readVcf(), I need to calculate the base-level coverage first (for example, using the way Martin had suggested), and convert coverage to frequency myself? Then why do I need to use VariantAnnotation package for this purpose, given the fact that I already have a text file with all the SNVs/INDELs with their genomic coordinates? Best, Yu Chuan On Fri, 8 Jun 2012, Sean Davis wrote: > On Fri, Jun 8, 2012 at 2:06 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: >> Hi Martin, >> >> One more question. Is there any way in Rsamtools to calculate SNVs/INDELS >> frequency directly using the output file from samtools? > > By "output file from samtools", I assume you mean a VCF file. If so, > take a look a the VariantAnnotation package and readVcf(). From > there, you'll need to do the calculation yourself, but that would be a > step on the way to accomplishing your task. > > Sean >
ADD REPLY
0
Entering edit mode
On Fri, Jun 8, 2012 at 10:15 PM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: > Hi Sean, > > I didn't find any function in ?the VariantAnnotation package that can > calculate mutant freq. Do you mean after reading in a VCF file using > readVcf(), I need to calculate the base-level coverage first (for example, > using the way Martin had suggested), and convert coverage to frequency > myself? Then why do I need to use VariantAnnotation package for this > purpose, given the fact that I already have a text file with all the > SNVs/INDELs with their genomic coordinates? My mistake. I thought you meant the frequency of the variant in your samples. You are talking about allele counts? If so, you'll need the bam files, as Martin has suggested. Sorry to mislead you. Sean > On Fri, 8 Jun 2012, Sean Davis wrote: > >> On Fri, Jun 8, 2012 at 2:06 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> >> wrote: >>> >>> Hi Martin, >>> >>> One more question. Is there any way in Rsamtools to calculate SNVs/INDELS >>> frequency directly using the output file from samtools? >> >> >> By "output file from samtools", I assume you mean a VCF file. ?If so, >> take a look a the VariantAnnotation package and readVcf(). ?From >> there, you'll need to do the calculation yourself, but that would be a >> step on the way to accomplishing your task. >> >> Sean >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Sean, No worries. I actually want mutant frequencies for each sample, but I didn't see any fucntion in VariantAnnotation for that. Anyway, I just found that samtools/bcftools may calculate that directly. Thanks for your help! Best, Yu Chuan On Sat, 9 Jun 2012, Sean Davis wrote: > On Fri, Jun 8, 2012 at 10:15 PM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> wrote: >> Hi Sean, >> >> I didn't find any function in ?the VariantAnnotation package that can >> calculate mutant freq. Do you mean after reading in a VCF file using >> readVcf(), I need to calculate the base-level coverage first (for example, >> using the way Martin had suggested), and convert coverage to frequency >> myself? Then why do I need to use VariantAnnotation package for this >> purpose, given the fact that I already have a text file with all the >> SNVs/INDELs with their genomic coordinates? > > My mistake. I thought you meant the frequency of the variant in your > samples. You are talking about allele counts? If so, you'll need the > bam files, as Martin has suggested. Sorry to mislead you. > > Sean > > >> On Fri, 8 Jun 2012, Sean Davis wrote: >> >>> On Fri, Jun 8, 2012 at 2:06 AM, Yu Chuan Tai <yuchuan at="" stat.berkeley.edu=""> >>> wrote: >>>> >>>> Hi Martin, >>>> >>>> One more question. Is there any way in Rsamtools to calculate SNVs/INDELS >>>> frequency directly using the output file from samtools? >>> >>> >>> By "output file from samtools", I assume you mean a VCF file. ?If so, >>> take a look a the VariantAnnotation package and readVcf(). ?From >>> there, you'll need to do the calculation yourself, but that would be a >>> step on the way to accomplishing your task. >>> >>> Sean >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6