Rsamtools question
1
0
Entering edit mode
Alpesh Querer ▴ 220
@alpesh-querer-4895
Last seen 10.1 years ago
Hi All, I am a newbie with R and Bioconductor. I have .sam files and I want to count the number of reads mapping to particular SNP locations on the .fa gene. Also I want to avoid double counting the SNPs which are covered by the same 36bp read. Any help is highly appreciated. Thanks, Alpesh [[alternative HTML version deleted]]
SNP SNP • 831 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.7 years ago
United States
Hi Alpesh, One approach would be to I. Read in snps with readGappedAlignments : Convert your sam file to a bam using samtools from the command line, samtools view -bS -o yourfile.bam yourfile.sam and read into R with readGappedAlignments library(GenomicFeatrures) ?readGappedAlignments II. Get gene ID : Use an org annotation package to get the gene id you are interested in. There are many ID types available (Entrez, Ensembl, RefGene, etc.). You will use this ID to identify the gene of interest in the TxDb package (below). Many org packages are available, see all starting with "org", http://bioconductor.org/packages/devel/data/annotation/ For the human package, library(org.Hs.eg.db) ?org.Hs.eg.db III. Get the genomic coordinates for the gene of interest : The TxDb packages have genomic coordinates as well as transcript, exon and gene ID's. See packages starting with "TxDb", http://bioconductor.org/packages/devel/data/annotation/ For example, the human hg19 known genes from UCSC are in the following package. Since this is a UCSC package the geneID's will be RefGene ID's. library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) ls(2) # to see the name of the txdb object If you want to make your own txdb see ?makeTranscriptDb There are several functions available for organizing you data by exons, transcripts or gene, see ?transcriptsBy IV. Counting You now have the genomic coordinates of your gene (from txdb package) and your snps (read in with readGappedAlignments). Several counting options are available, library(GenomicRanges) ?summarizeOverlaps # avoids double counting, available in R-devel version ?countOverlaps ?findOverlaps Valerie On 10/02/2011 08:28 AM, Alpesh Querer wrote: > Hi All, > > I am a newbie with R and Bioconductor. I have .sam files and I want to count > the number of reads mapping to particular SNP locations on the .fa gene. > Also I want to avoid double counting the SNPs which are covered by the same > 36bp read. > Any help is highly appreciated. > > Thanks, > Alpesh > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On 10/04/2011 08:58 AM, Valerie Obenchain wrote: > Hi Alpesh, > > One approach would be to > > I. Read in snps with readGappedAlignments : > > Convert your sam file to a bam using samtools from the command line, > > samtools view -bS -o yourfile.bam yourfile.sam or from within R -- Rsamtools::asBam > > and read into R with readGappedAlignments > > library(GenomicFeatrures) > ?readGappedAlignments > > > II. Get gene ID : > > Use an org annotation package to get the gene id you are interested > in. There are many ID types available (Entrez, Ensembl, RefGene, > etc.). You will use this ID to identify the gene of interest in the > TxDb package (below). > Many org packages are available, see all starting with "org", > > http://bioconductor.org/packages/devel/data/annotation/ > > For the human package, > > library(org.Hs.eg.db) > ?org.Hs.eg.db > > > III. Get the genomic coordinates for the gene of interest : > The TxDb packages have genomic coordinates as well as transcript, exon > and gene ID's. See packages starting with "TxDb", > > http://bioconductor.org/packages/devel/data/annotation/ > > For example, the human hg19 known genes from UCSC are in the following > package. Since this is a UCSC package the geneID's will be RefGene ID's. > > library(GenomicFeatures) > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > ls(2) # to see the name of the txdb object > > If you want to make your own txdb see > > ?makeTranscriptDb > > There are several functions available for organizing you data by > exons, transcripts or gene, see > > ?transcriptsBy > > > IV. Counting > > You now have the genomic coordinates of your gene (from txdb package) > and your snps (read in with readGappedAlignments). > Several counting options are available, > > library(GenomicRanges) > ?summarizeOverlaps # avoids double counting, available in R-devel > version > ?countOverlaps > ?findOverlaps also Rsamtools::applyPileups which gives bases over genomic coordinates Valerie > > > Valerie > > > > On 10/02/2011 08:28 AM, Alpesh Querer wrote: >> Hi All, >> >> I am a newbie with R and Bioconductor. I have .sam files and I want >> to count >> the number of reads mapping to particular SNP locations on the .fa gene. >> Also I want to avoid double counting the SNPs which are covered by >> the same >> 36bp read. >> Any help is highly appreciated. >> >> Thanks, >> Alpesh >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6