Visualization of alignments with mismatch bases

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.3 years ago

United States

On Thu, Feb 28, 2013 at 11:46 AM, Tengfei Yin <yintengfei@gmail.com> wrote: > On Thu, Feb 28, 2013 at 11:46 AM, Julian Gehring > <julian.gehring@gmail.com>wrote: > > > Hi, > > > > Is there a good way to visualize aligned reads with mismatch bases (SNPs) > > along the genome (similar to what one knows from the standard genome > > browsers)? > > > > 'ggbio' came to my mind with which plotting a pile of reads is straight > > forward. However, overlaying the mismatched bases for each reads seems > not > > that easy any more. > > > > Hi Julian, > > You are right, currently ggbio only supports summary of mismatch showing as > coverage plot and barchart(?stat_mismatch), but looks like what you want is > detailed short reads alignments visualization with mismatch bases showing > right on the reads, . It's possible, but not easy to do it manually...you > have to have two GRanges objects, one for alignment one for SNP, and plot > them layer by layer, the tricky part is assigning each reads fixed stepping > level, so snp can be plotted on the right position. I will NOT recommend > you to do this, it's probably not worth taking time doing it. I need to > implement this features in some easy way. > > The tricky part is that there are different modes, 1. show reads as gray > rectangle, and color mismatch as segment Are you talking about single-read detail for #1 above or what we get from stat_mismatch? > 2. show SNP as nucleotide text, > A/C/T/G.., 3. show sequence detail for each alignment. those depends on > zoomed level and even coverage, and I guess most time you don't want to see > bases for every reads... > > I think #3 is really important for any sort of diagnosis of alignment issues, and I would find this very useful. Right now, I'm using IGV for this, but ggbio would be a lot more flexible. This is probably my #1 top missing feature from ggbio. > Just curious for future ggbio development, are those modes want you want? > are you just using bam files here? no VCF files involved right? Because > you mentioned 'snp', I think what you mean is mismatch? > > ps: I cannot speak for other tools, and only thing I know, in SRAdb > package, looks like it could fire your data in IGV.. > > Thanks > > Tengfei > > > > Does someone of you have a good way to do this or found another solution > > that works for R/bioc? > > > > Best wishes > > Julian > > > > ______________________________**_________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/**listinfo/bioconductor< > https://stat.ethz.ch/mailman/listinfo/bioconductor> > > Search the archives: http://news.gmane.org/gmane.** > > science.biology.informatics.**conductor< > http://news.gmane.org/gmane.science.biology.informatics.conductor> > > > > > > -- > Tengfei Yin > MCDB PhD student > 1620 Howe Hall, 2274, > Iowa State University > Ames, IA,50011-2274 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

SNP Coverage Alignment Visualization ggbio SNP Coverage Alignment Visualization ggbio • 2.0k views

ADD COMMENT • link updated 12.1 years ago by Tengfei Yin ▴ 420 • written 12.1 years ago by Michael Lawrence ★ 11k

0

Entering edit mode

Tengfei Yin ▴ 420

@tengfei-yin-4323

Last seen 8.9 years ago

On Thu, Feb 28, 2013 at 11:08 PM, Michael Lawrence < lawrence.michael@gene.com> wrote: > > > > On Thu, Feb 28, 2013 at 11:46 AM, Tengfei Yin <yintengfei@gmail.com>wrote: > >> On Thu, Feb 28, 2013 at 11:46 AM, Julian Gehring >> <julian.gehring@gmail.com>wrote: >> >> > Hi, >> > >> > Is there a good way to visualize aligned reads with mismatch bases >> (SNPs) >> > along the genome (similar to what one knows from the standard genome >> > browsers)? >> > >> > 'ggbio' came to my mind with which plotting a pile of reads is straight >> > forward. However, overlaying the mismatched bases for each reads seems >> not >> > that easy any more. >> > >> >> Hi Julian, >> >> You are right, currently ggbio only supports summary of mismatch showing >> as >> coverage plot and barchart(?stat_mismatch), but looks like what you want >> is >> detailed short reads alignments visualization with mismatch bases showing >> right on the reads, . It's possible, but not easy to do it manually...you >> have to have two GRanges objects, one for alignment one for SNP, and plot >> them layer by layer, the tricky part is assigning each reads fixed >> stepping >> level, so snp can be plotted on the right position. I will NOT recommend >> you to do this, it's probably not worth taking time doing it. I need to >> implement this features in some easy way. >> >> The tricky part is that there are different modes, 1. show reads as gray >> rectangle, and color mismatch as segment > > > Are you talking about single-read detail for #1 above or what we get from > stat_mismatch? > The single-read detail, not summary mismatch from stat_mismatch(). so for example, one 75bp reads shown as one rectangle filled with gray color, and use vertical colored segment indicated mismatched position on this reads if any. > > >> 2. show SNP as nucleotide text, >> A/C/T/G.., 3. show sequence detail for each alignment. those depends on >> zoomed level and even coverage, and I guess most time you don't want to >> see >> bases for every reads... >> >> > I think #3 is really important for any sort of diagnosis of alignment > issues, and I would find this very useful. Right now, I'm using IGV for > this, but ggbio would be a lot more flexible. This is probably my #1 top > missing feature from ggbio. > Got it, those features are indeed important but missing, I will keep this in mind. I was thinking about generalize it into visualization of DNAStringSet first, but mismatch is not that general, require BSgenome and raw bam files at the same time to get mismatched information. So we may extend the list returned by scanBam(), and create a GRanges or vector with equal length of short reads numbers for particular region which storing mismatch position and nucleotide. I am trying to think about a more general data structure, not only for visualization purpose, but also for command line analysis, for example, people can simply retrieve mismatch information for any single short read from specific data structure. and Julian, sorry that I don't have any working example right now, I will let you know when this feature is ready to test. Tengfei > > > >> Just curious for future ggbio development, are those modes want you want? >> are you just using bam files here? no VCF files involved right? Because >> you mentioned 'snp', I think what you mean is mismatch? >> >> ps: I cannot speak for other tools, and only thing I know, in SRAdb >> package, looks like it could fire your data in IGV.. >> >> Thanks >> >> Tengfei >> >> >> > Does someone of you have a good way to do this or found another solution >> > that works for R/bioc? >> > >> > Best wishes >> > Julian >> > >> > ______________________________**_________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/**listinfo/bioconductor< >> https://stat.ethz.ch/mailman/listinfo/bioconductor> >> > Search the archives: http://news.gmane.org/gmane.** >> > science.biology.informatics.**conductor< >> http://news.gmane.org/gmane.science.biology.informatics.conductor> >> > >> >> >> >> -- >> Tengfei Yin >> MCDB PhD student >> 1620 Howe Hall, 2274, >> Iowa State University >> Ames, IA,50011-2274 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 [[alternative HTML version deleted]]

ADD COMMENT • link 12.1 years ago Tengfei Yin ▴ 420

0

Entering edit mode

You should check out the GappedReads object from ShortRead. It extends GappedAlignments to hold the read sequence information in a DNAStringSet. You could then take the genome() identifier and auto-lookup the BSgenome. Or if the SAM MD tag is present in the GappedAlignments, you could use that for the mismatches. Michael On Thu, Feb 28, 2013 at 9:43 PM, Tengfei Yin <yintengfei@gmail.com> wrote: > > > On Thu, Feb 28, 2013 at 11:08 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> >> >> >> On Thu, Feb 28, 2013 at 11:46 AM, Tengfei Yin <yintengfei@gmail.com>wrote: >> >>> On Thu, Feb 28, 2013 at 11:46 AM, Julian Gehring >>> <julian.gehring@gmail.com>wrote: >>> >>> > Hi, >>> > >>> > Is there a good way to visualize aligned reads with mismatch bases >>> (SNPs) >>> > along the genome (similar to what one knows from the standard genome >>> > browsers)? >>> > >>> > 'ggbio' came to my mind with which plotting a pile of reads is straight >>> > forward. However, overlaying the mismatched bases for each reads >>> seems not >>> > that easy any more. >>> > >>> >>> Hi Julian, >>> >>> You are right, currently ggbio only supports summary of mismatch showing >>> as >>> coverage plot and barchart(?stat_mismatch), but looks like what you want >>> is >>> detailed short reads alignments visualization with mismatch bases showing >>> right on the reads, . It's possible, but not easy to do it manually...you >>> have to have two GRanges objects, one for alignment one for SNP, and plot >>> them layer by layer, the tricky part is assigning each reads fixed >>> stepping >>> level, so snp can be plotted on the right position. I will NOT recommend >>> you to do this, it's probably not worth taking time doing it. I need to >>> implement this features in some easy way. >>> >>> The tricky part is that there are different modes, 1. show reads as gray >>> rectangle, and color mismatch as segment >> >> >> Are you talking about single-read detail for #1 above or what we get from >> stat_mismatch? >> > > The single-read detail, not summary mismatch from stat_mismatch(). so for > example, one 75bp reads shown as one rectangle filled with gray color, and > use vertical colored segment indicated mismatched position on this reads if > any. > > >> >> >>> 2. show SNP as nucleotide text, >>> A/C/T/G.., 3. show sequence detail for each alignment. those depends on >>> zoomed level and even coverage, and I guess most time you don't want to >>> see >>> bases for every reads... >>> >>> >> I think #3 is really important for any sort of diagnosis of alignment >> issues, and I would find this very useful. Right now, I'm using IGV for >> this, but ggbio would be a lot more flexible. This is probably my #1 top >> missing feature from ggbio. >> > > Got it, those features are indeed important but missing, I will keep this > in mind. > > I was thinking about generalize it into visualization of DNAStringSet > first, but mismatch is not that general, require BSgenome and raw bam files > at the same time to get mismatched information. So we may extend the list > returned by scanBam(), and create a GRanges or vector with equal length of > short reads numbers for particular region which storing mismatch position > and nucleotide. I am trying to think about a more general data structure, > not only for visualization purpose, but also for command line analysis, for > example, people can simply retrieve mismatch information for any single > short read from specific data structure. > > and Julian, sorry that I don't have any working example right now, I will > let you know when this feature is ready to test. > > Tengfei > >> >> >> >>> Just curious for future ggbio development, are those modes want you want? >>> are you just using bam files here? no VCF files involved right? Because >>> you mentioned 'snp', I think what you mean is mismatch? >>> >>> ps: I cannot speak for other tools, and only thing I know, in SRAdb >>> package, looks like it could fire your data in IGV.. >>> >>> Thanks >>> >>> Tengfei >>> >>> >>> > Does someone of you have a good way to do this or found another >>> solution >>> > that works for R/bioc? >>> > >>> > Best wishes >>> > Julian >>> > >>> > ______________________________**_________________ >>> > Bioconductor mailing list >>> > Bioconductor@r-project.org >>> > https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>> > Search the archives: http://news.gmane.org/gmane.** >>> > science.biology.informatics.**conductor< >>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>> > >>> >>> >>> >>> -- >>> Tengfei Yin >>> MCDB PhD student >>> 1620 Howe Hall, 2274, >>> Iowa State University >>> Ames, IA,50011-2274 >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > > -- > Tengfei Yin > MCDB PhD student > 1620 Howe Hall, 2274, > Iowa State University > Ames, IA,50011-2274 > > > [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago Michael Lawrence ★ 11k

0

Entering edit mode

On 3/1/2013 5:03 AM, Michael Lawrence wrote: > You should check out the GappedReads object from ShortRead. It extends > GappedAlignments to hold the read sequence information in a DNAStringSet. > You could then take the genome() identifier and auto-lookup the BSgenome. > Or if the SAM MD tag is present in the GappedAlignments, you could use that > for the mismatches. I think it's better to use readGappedAlignments with ScanBamParam(what="seq") to get a GappedAlignments object with a 'seq' metadata column (also any tags in the bam file); the most interesting functionality that might come with a GappedReads (qnarrow, narrow, which alter the cigar and potentially underlying sequence) are not implemented on GappedReads (the class was introduced before GappedAlignments supported additional metadata columns). Probably live to regret it but there is which uses the cigar to make a view onto a DNAStringSet of aligned reads. There are some crude visualizations in there. It seems to have problems with reads with (long) N's. The current implementation collapses reads into single lines the way genome browsers do (losing information on individual reads), but this could be changed easily I think by making lvls in line 29 just a vector along the input ranges. One can also create a parallel view of 'qual', allowing overlaying of sequence and quality info. I played around last night with wrapping this in shiny (biocLite("shiny")) ============== cigarview/ui.R ============== library(shiny) bamFilepath <- file.choose() bamFile <- open(BamFile(bamFilepath)) seqnames <- seqlevels(bamFile) names(seqnames) <- seqnames start <- 1248187 # min(start(seqinfo)) end <- 1300264 # max(end(seqinfo)) shinyUI(pageWithSidebar( headerPanel("Cigar Viewer"), sidebarPanel( textInput("bamFile", "BAM file:", bamFilepath), selectInput("seqname", "Seqname:", seqnames, "5"), numericInput("start", "Global start:", start), numericInput("end", "Global end:", end), h4("View (relative to global coordinates)"), numericInput("viewStart", "Start:", 1), numericInput("viewWidth", "Width:", 1000) ), mainPanel( plotOutput("pileup") ) )) ================== cigarview/server.R ================== library(shiny) library(Rsamtools) library(RColorBrewer) source("../4067179/cigarAlign.R") shinyServer(function(input, output) { aln <- reactive({ which <- GRanges(input$seqname, IRanges(input$start, input$end)) param <- ScanBamParam(which=which, what="seq") cigarAlign(input$bamFile, param=param, pad=TRUE) }) output$pileup <- renderPlot({ variantplot2(subviews(aln(), input$viewStart, width=input$viewWidth)) }) }) and then shiny::runApp("./cigarview"); changing the 'view' parameters is basically usable for interactive exploration within the region of global start / end coordinates. Martin > > Michael > > > On Thu, Feb 28, 2013 at 9:43 PM, Tengfei Yin <yintengfei at="" gmail.com=""> wrote: > >> >> >> On Thu, Feb 28, 2013 at 11:08 PM, Michael Lawrence < >> lawrence.michael at gene.com> wrote: >> >>> >>> >>> >>> On Thu, Feb 28, 2013 at 11:46 AM, Tengfei Yin <yintengfei at="" gmail.com="">wrote: >>> >>>> On Thu, Feb 28, 2013 at 11:46 AM, Julian Gehring >>>> <julian.gehring at="" gmail.com="">wrote: >>>> >>>>> Hi, >>>>> >>>>> Is there a good way to visualize aligned reads with mismatch bases >>>> (SNPs) >>>>> along the genome (similar to what one knows from the standard genome >>>>> browsers)? >>>>> >>>>> 'ggbio' came to my mind with which plotting a pile of reads is straight >>>>> forward. However, overlaying the mismatched bases for each reads >>>> seems not >>>>> that easy any more. >>>>> >>>> >>>> Hi Julian, >>>> >>>> You are right, currently ggbio only supports summary of mismatch showing >>>> as >>>> coverage plot and barchart(?stat_mismatch), but looks like what you want >>>> is >>>> detailed short reads alignments visualization with mismatch bases showing >>>> right on the reads, . It's possible, but not easy to do it manually...you >>>> have to have two GRanges objects, one for alignment one for SNP, and plot >>>> them layer by layer, the tricky part is assigning each reads fixed >>>> stepping >>>> level, so snp can be plotted on the right position. I will NOT recommend >>>> you to do this, it's probably not worth taking time doing it. I need to >>>> implement this features in some easy way. >>>> >>>> The tricky part is that there are different modes, 1. show reads as gray >>>> rectangle, and color mismatch as segment >>> >>> >>> Are you talking about single-read detail for #1 above or what we get from >>> stat_mismatch? >>> >> >> The single-read detail, not summary mismatch from stat_mismatch(). so for >> example, one 75bp reads shown as one rectangle filled with gray color, and >> use vertical colored segment indicated mismatched position on this reads if >> any. >> >> >>> >>> >>>> 2. show SNP as nucleotide text, >>>> A/C/T/G.., 3. show sequence detail for each alignment. those depends on >>>> zoomed level and even coverage, and I guess most time you don't want to >>>> see >>>> bases for every reads... >>>> >>>> >>> I think #3 is really important for any sort of diagnosis of alignment >>> issues, and I would find this very useful. Right now, I'm using IGV for >>> this, but ggbio would be a lot more flexible. This is probably my #1 top >>> missing feature from ggbio. >>> >> >> Got it, those features are indeed important but missing, I will keep this >> in mind. >> >> I was thinking about generalize it into visualization of DNAStringSet >> first, but mismatch is not that general, require BSgenome and raw bam files >> at the same time to get mismatched information. So we may extend the list >> returned by scanBam(), and create a GRanges or vector with equal length of >> short reads numbers for particular region which storing mismatch position >> and nucleotide. I am trying to think about a more general data structure, >> not only for visualization purpose, but also for command line analysis, for >> example, people can simply retrieve mismatch information for any single >> short read from specific data structure. >> >> and Julian, sorry that I don't have any working example right now, I will >> let you know when this feature is ready to test. >> >> Tengfei >> >>> >>> >>> >>>> Just curious for future ggbio development, are those modes want you want? >>>> are you just using bam files here? no VCF files involved right? Because >>>> you mentioned 'snp', I think what you mean is mismatch? >>>> >>>> ps: I cannot speak for other tools, and only thing I know, in SRAdb >>>> package, looks like it could fire your data in IGV.. >>>> >>>> Thanks >>>> >>>> Tengfei >>>> >>>> >>>>> Does someone of you have a good way to do this or found another >>>> solution >>>>> that works for R/bioc? >>>>> >>>>> Best wishes >>>>> Julian >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>> Search the archives: http://news.gmane.org/gmane.** >>>>> science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>>> >>>> >>>> >>>> >>>> -- >>>> Tengfei Yin >>>> MCDB PhD student >>>> 1620 Howe Hall, 2274, >>>> Iowa State University >>>> Ames, IA,50011-2274 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> >> >> -- >> Tengfei Yin >> MCDB PhD student >> 1620 Howe Hall, 2274, >> Iowa State University >> Ames, IA,50011-2274 >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Dr. Martin Morgan, PhD Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

ADD REPLY • link 12.1 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin, thanks for your suggestion, GappedAlignments looks promising and together with MD tag as Michael suggested, I should get what I want, I will go through your source code carefully later. I tried your sample code, that actually looks pretty cool with shiny and responsive too. Is that going to be in some of your packages? btw, I am also experimenting with shiny recently and writing a package focusing on quality control of variants and visualization of vcf files using ggbio, I guess I can learn and get some great ideas from your example code, thanks a lot! Tengfei On Fri, Mar 1, 2013 at 12:40 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 3/1/2013 5:03 AM, Michael Lawrence wrote: > >> You should check out the GappedReads object from ShortRead. It extends >> GappedAlignments to hold the read sequence information in a DNAStringSet. >> You could then take the genome() identifier and auto-lookup the BSgenome. >> Or if the SAM MD tag is present in the GappedAlignments, you could use >> that >> for the mismatches. >> > > I think it's better to use readGappedAlignments with > ScanBamParam(what="seq") to get a GappedAlignments object with a 'seq' > metadata column (also any tags in the bam file); the most interesting > functionality that might come with a GappedReads (qnarrow, narrow, which > alter the cigar and potentially underlying sequence) are not implemented on > GappedReads (the class was introduced before GappedAlignments supported > additional metadata columns). > > Probably live to regret it but there is > > https://gist.github.com/**mtmorgan/4067179<https: gist.github.com="" mtmorgan="" 4067179=""> > > which uses the cigar to make a view onto a DNAStringSet of aligned reads. > There are some crude visualizations in there. It seems to have problems > with reads with (long) N's. The current implementation collapses reads into > single lines the way genome browsers do (losing information on individual > reads), but this could be changed easily I think by making lvls in line 29 > just a vector along the input ranges. One can also create a parallel view > of 'qual', allowing overlaying of sequence and quality info. > > I played around last night with wrapping this in shiny (biocLite("shiny")) > > ============== > cigarview/ui.R > ============== > library(shiny) > bamFilepath <- file.choose() > bamFile <- open(BamFile(bamFilepath)) > > seqnames <- seqlevels(bamFile) > names(seqnames) <- seqnames > start <- 1248187 # min(start(seqinfo)) > end <- 1300264 # max(end(seqinfo)) > > shinyUI(pageWithSidebar( > > headerPanel("Cigar Viewer"), > > sidebarPanel( > textInput("bamFile", "BAM file:", bamFilepath), > selectInput("seqname", "Seqname:", seqnames, "5"), > numericInput("start", "Global start:", start), > numericInput("end", "Global end:", end), > > h4("View (relative to global coordinates)"), > numericInput("viewStart", "Start:", 1), > numericInput("viewWidth", "Width:", 1000) > ), > > mainPanel( > plotOutput("pileup") > ) > > )) > > ================== > cigarview/server.R > ================== > library(shiny) > library(Rsamtools) > library(RColorBrewer) > source("../4067179/cigarAlign.**R") > > shinyServer(function(input, output) { > > aln <- reactive({ > which <- GRanges(input$seqname, > IRanges(input$start, input$end)) > param <- ScanBamParam(which=which, what="seq") > cigarAlign(input$bamFile, param=param, pad=TRUE) > }) > > output$pileup <- renderPlot({ > variantplot2(subviews(aln(), input$viewStart, > width=input$viewWidth)) > }) > > }) > > > and then shiny::runApp("./cigarview"); changing the 'view' parameters is > basically usable for interactive exploration within the region of global > start / end coordinates. > > Martin > > > >> Michael >> >> >> On Thu, Feb 28, 2013 at 9:43 PM, Tengfei Yin <yintengfei@gmail.com> >> wrote: >> >> >>> >>> On Thu, Feb 28, 2013 at 11:08 PM, Michael Lawrence < >>> lawrence.michael@gene.com> wrote: >>> >>> >>>> >>>> >>>> On Thu, Feb 28, 2013 at 11:46 AM, Tengfei Yin <yintengfei@gmail.com>>>> >wrote: >>>> >>>> On Thu, Feb 28, 2013 at 11:46 AM, Julian Gehring >>>>> <julian.gehring@gmail.com>**wrote: >>>>> >>>>> Hi, >>>>>> >>>>>> Is there a good way to visualize aligned reads with mismatch bases >>>>>> >>>>> (SNPs) >>>>> >>>>>> along the genome (similar to what one knows from the standard genome >>>>>> browsers)? >>>>>> >>>>>> 'ggbio' came to my mind with which plotting a pile of reads is >>>>>> straight >>>>>> forward. However, overlaying the mismatched bases for each reads >>>>>> >>>>> seems not >>>>> >>>>>> that easy any more. >>>>>> >>>>>> >>>>> Hi Julian, >>>>> >>>>> You are right, currently ggbio only supports summary of mismatch >>>>> showing >>>>> as >>>>> coverage plot and barchart(?stat_mismatch), but looks like what you >>>>> want >>>>> is >>>>> detailed short reads alignments visualization with mismatch bases >>>>> showing >>>>> right on the reads, . It's possible, but not easy to do it >>>>> manually...you >>>>> have to have two GRanges objects, one for alignment one for SNP, and >>>>> plot >>>>> them layer by layer, the tricky part is assigning each reads fixed >>>>> stepping >>>>> level, so snp can be plotted on the right position. I will NOT >>>>> recommend >>>>> you to do this, it's probably not worth taking time doing it. I need to >>>>> implement this features in some easy way. >>>>> >>>>> The tricky part is that there are different modes, 1. show reads as >>>>> gray >>>>> rectangle, and color mismatch as segment >>>>> >>>> >>>> >>>> Are you talking about single-read detail for #1 above or what we get >>>> from >>>> stat_mismatch? >>>> >>>> >>> The single-read detail, not summary mismatch from stat_mismatch(). so for >>> example, one 75bp reads shown as one rectangle filled with gray color, >>> and >>> use vertical colored segment indicated mismatched position on this reads >>> if >>> any. >>> >>> >>> >>>> >>>> 2. show SNP as nucleotide text, >>>>> A/C/T/G.., 3. show sequence detail for each alignment. those depends >>>>> on >>>>> zoomed level and even coverage, and I guess most time you don't want to >>>>> see >>>>> bases for every reads... >>>>> >>>>> >>>>> I think #3 is really important for any sort of diagnosis of alignment >>>> issues, and I would find this very useful. Right now, I'm using IGV for >>>> this, but ggbio would be a lot more flexible. This is probably my #1 top >>>> missing feature from ggbio. >>>> >>>> >>> Got it, those features are indeed important but missing, I will keep this >>> in mind. >>> >>> I was thinking about generalize it into visualization of DNAStringSet >>> first, but mismatch is not that general, require BSgenome and raw bam >>> files >>> at the same time to get mismatched information. So we may extend the list >>> returned by scanBam(), and create a GRanges or vector with equal length >>> of >>> short reads numbers for particular region which storing mismatch position >>> and nucleotide. I am trying to think about a more general data structure, >>> not only for visualization purpose, but also for command line analysis, >>> for >>> example, people can simply retrieve mismatch information for any single >>> short read from specific data structure. >>> >>> and Julian, sorry that I don't have any working example right now, I will >>> let you know when this feature is ready to test. >>> >>> Tengfei >>> >>> >>>> >>>> >>>> Just curious for future ggbio development, are those modes want you >>>>> want? >>>>> are you just using bam files here? no VCF files involved right? >>>>> Because >>>>> you mentioned 'snp', I think what you mean is mismatch? >>>>> >>>>> ps: I cannot speak for other tools, and only thing I know, in SRAdb >>>>> package, looks like it could fire your data in IGV.. >>>>> >>>>> Thanks >>>>> >>>>> Tengfei >>>>> >>>>> >>>>> Does someone of you have a good way to do this or found another >>>>>> >>>>> solution >>>>> >>>>>> that works for R/bioc? >>>>>> >>>>>> Best wishes >>>>>> Julian >>>>>> >>>>>> ______________________________****_________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>>>>> < >>>>>> >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> > >>>>> >>>>>> Search the archives: http://news.gmane.org/gmane.** >>>>>> science.biology.informatics.****conductor< >>>>>> >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> > >>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Tengfei Yin >>>>> MCDB PhD student >>>>> 1620 Howe Hall, 2274, >>>>> Iowa State University >>>>> Ames, IA,50011-2274 >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> >>>> >>> >>> -- >>> Tengfei Yin >>> MCDB PhD student >>> 1620 Howe Hall, 2274, >>> Iowa State University >>> Ames, IA,50011-2274 >>> >>> >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Dr. Martin Morgan, PhD > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago Tengfei Yin ▴ 420

0

Entering edit mode

Michael, Thanks a lot for you suggestion, I will think about it. Tengfei On Fri, Mar 1, 2013 at 7:03 AM, Michael Lawrence <lawrence.michael@gene.com>wrote: > You should check out the GappedReads object from ShortRead. It extends > GappedAlignments to hold the read sequence information in a DNAStringSet. > You could then take the genome() identifier and auto-lookup the BSgenome. > Or if the SAM MD tag is present in the GappedAlignments, you could use that > for the mismatches. > > Michael > > > On Thu, Feb 28, 2013 at 9:43 PM, Tengfei Yin <yintengfei@gmail.com> wrote: > >> >> >> On Thu, Feb 28, 2013 at 11:08 PM, Michael Lawrence < >> lawrence.michael@gene.com> wrote: >> >>> >>> >>> >>> On Thu, Feb 28, 2013 at 11:46 AM, Tengfei Yin <yintengfei@gmail.com>wrote: >>> >>>> On Thu, Feb 28, 2013 at 11:46 AM, Julian Gehring >>>> <julian.gehring@gmail.com>wrote: >>>> >>>> > Hi, >>>> > >>>> > Is there a good way to visualize aligned reads with mismatch bases >>>> (SNPs) >>>> > along the genome (similar to what one knows from the standard genome >>>> > browsers)? >>>> > >>>> > 'ggbio' came to my mind with which plotting a pile of reads is >>>> straight >>>> > forward. However, overlaying the mismatched bases for each reads >>>> seems not >>>> > that easy any more. >>>> > >>>> >>>> Hi Julian, >>>> >>>> You are right, currently ggbio only supports summary of mismatch >>>> showing as >>>> coverage plot and barchart(?stat_mismatch), but looks like what you >>>> want is >>>> detailed short reads alignments visualization with mismatch bases >>>> showing >>>> right on the reads, . It's possible, but not easy to do it >>>> manually...you >>>> have to have two GRanges objects, one for alignment one for SNP, and >>>> plot >>>> them layer by layer, the tricky part is assigning each reads fixed >>>> stepping >>>> level, so snp can be plotted on the right position. I will NOT recommend >>>> you to do this, it's probably not worth taking time doing it. I need to >>>> implement this features in some easy way. >>>> >>>> The tricky part is that there are different modes, 1. show reads as gray >>>> rectangle, and color mismatch as segment >>> >>> >>> Are you talking about single-read detail for #1 above or what we get >>> from stat_mismatch? >>> >> >> The single-read detail, not summary mismatch from stat_mismatch(). so for >> example, one 75bp reads shown as one rectangle filled with gray color, and >> use vertical colored segment indicated mismatched position on this reads if >> any. >> >> >>> >>> >>>> 2. show SNP as nucleotide text, >>>> A/C/T/G.., 3. show sequence detail for each alignment. those depends on >>>> zoomed level and even coverage, and I guess most time you don't want to >>>> see >>>> bases for every reads... >>>> >>>> >>> I think #3 is really important for any sort of diagnosis of alignment >>> issues, and I would find this very useful. Right now, I'm using IGV for >>> this, but ggbio would be a lot more flexible. This is probably my #1 top >>> missing feature from ggbio. >>> >> >> Got it, those features are indeed important but missing, I will keep this >> in mind. >> >> I was thinking about generalize it into visualization of DNAStringSet >> first, but mismatch is not that general, require BSgenome and raw bam files >> at the same time to get mismatched information. So we may extend the list >> returned by scanBam(), and create a GRanges or vector with equal length of >> short reads numbers for particular region which storing mismatch position >> and nucleotide. I am trying to think about a more general data structure, >> not only for visualization purpose, but also for command line analysis, for >> example, people can simply retrieve mismatch information for any single >> short read from specific data structure. >> >> and Julian, sorry that I don't have any working example right now, I will >> let you know when this feature is ready to test. >> >> Tengfei >> >>> >>> >>> >>>> Just curious for future ggbio development, are those modes want you >>>> want? >>>> are you just using bam files here? no VCF files involved right? Because >>>> you mentioned 'snp', I think what you mean is mismatch? >>>> >>>> ps: I cannot speak for other tools, and only thing I know, in SRAdb >>>> package, looks like it could fire your data in IGV.. >>>> >>>> Thanks >>>> >>>> Tengfei >>>> >>>> >>>> > Does someone of you have a good way to do this or found another >>>> solution >>>> > that works for R/bioc? >>>> > >>>> > Best wishes >>>> > Julian >>>> > >>>> > ______________________________**_________________ >>>> > Bioconductor mailing list >>>> > Bioconductor@r-project.org >>>> > https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>> > Search the archives: http://news.gmane.org/gmane.** >>>> > science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>> > >>>> >>>> >>>> >>>> -- >>>> Tengfei Yin >>>> MCDB PhD student >>>> 1620 Howe Hall, 2274, >>>> Iowa State University >>>> Ames, IA,50011-2274 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> >> >> -- >> Tengfei Yin >> MCDB PhD student >> 1620 Howe Hall, 2274, >> Iowa State University >> Ames, IA,50011-2274 >> >> >> > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 [[alternative HTML version deleted]]

ADD REPLY • link 12.1 years ago Tengfei Yin ▴ 420

Login before adding your answer.