Question

Regarding multiple hits of same read

0

Entering edit mode

deepika lakhwani ▴ 170

@deepika-lakhwani-5470

Last seen 10.2 years ago

Hello, i have been trying to find out differential expression of gene using different R packages. I have rice illumina sequencing data (pair end) with 100 bp. i mapped the data n rice genome using tophat now i got accepted_hit. bam file in which details of mapping is available. Now i am confused because it can be possible that a single read can align on multiple position. When we count the reads for differential analysis then same read is present in two different genes. So i have a question that is correct or not? and i am reading genomic features R package for counting the reads in libraries. Can anyone explain the summarizeOverlaps function? i read manual but what is basic function of it. Thanking you regards deepika [[alternative HTML version deleted]]

Sequencing Sequencing • 1.5k views

ADD COMMENT • link updated 11.1 years ago by Steve Lianoglou ★ 13k • written 11.1 years ago by deepika lakhwani ▴ 170

score 0 · Answer 1 · 2013-09-24

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 20 months ago

United States

Hi, On Tue, Sep 24, 2013 at 4:25 AM, deepika lakhwani <lakhwanideepika at="" gmail.com=""> wrote: > Hello, > > i have been trying to find out differential expression of gene using > different R packages. I have rice illumina sequencing data (pair end) with > 100 bp. i mapped the data n rice genome using tophat now i got > accepted_hit. bam file in which details of mapping is available. > > Now i am confused because it can be possible that a single read can align > on multiple position. One way to deal with reads that align to multiple (genomic) positions is to not deal with them at all. Many people only use reads that align uniquely to the genome. > When we count the reads for differential analysis > then same read is present in two different genes. This is different than what you mention above. It is possible that: (1) One read aligns to multiple places in the genome. These reads are often called "multimapped" (multimappers, etc.) and as I mentioned above, it is rather common to ignore these and to only count reads that align to a unique position in the genome. (2) It is possible for two different genes to share the same genomic locus as each other, so even though a read maps to one position in the genome, there is more than one gene that it can be assigned to. > So i have a question that > is correct or not? Can you clarify in greater detail what you are asking "correctness" for? > and i am reading genomic features R package for counting > the reads in libraries. Can anyone explain the summarizeOverlaps function? Please read through the copious documentation made available in the GenomicRanges package: http://bioconductor.org/packages/2.12/bioc/html/GenomicRanges.html There are five PDF files available there under the "Documentation" section and all of them are worth your close attention. If you still have more specific questions after reading through those, please ask those specific ones here. A generic question like "explain the summarizeOverlaps" function isn't helpful, as it is explained in multiple places in the documentation -- if there is something specific about it that is confusing, we can help you to address that. > i read manual but what is basic function of it. So what part is unclear? You'll likely also want to read through the vignette for the parathyroidSE package: http://bioconductor.org/packages/release/data/experiment/vignettes/par athyroidSE/inst/doc/parathyroidSE.pdf It shows in great detail how to go from aligned reads to "counted" genes and exons. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech

ADD COMMENT • link 11.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thank You...Steve for answering. Ok...I am reading all the pdfs very carefully and I am satisfy with your answer. But please clear my some basic question... which type of reads are selected in differential expression analysis? I understand that only unique mapped reads use for differential expression analysis but I think multimapped reads also have an important role in differential expression analysis because genome has so many duplicated/paralogous genes. if I am wrong then please tell me. regards deepika ---------- Forwarded message ---------- From: Steve Lianoglou <lianoglou.steve@gene.com> Date: Tue, Sep 24, 2013 at 12:30 PM Subject: Re: [BioC] Regarding multiple hits of same read To: deepika lakhwani <lakhwanideepika@gmail.com> Cc: "bioconductor@r-project.org list" <bioconductor@r-project.org> Hi, On Tue, Sep 24, 2013 at 4:25 AM, deepika lakhwani <lakhwanideepika@gmail.com> wrote: > Hello, > > i have been trying to find out differential expression of gene using > different R packages. I have rice illumina sequencing data (pair end) with > 100 bp. i mapped the data n rice genome using tophat now i got > accepted_hit. bam file in which details of mapping is available. > > Now i am confused because it can be possible that a single read can align > on multiple position. One way to deal with reads that align to multiple (genomic) positions is to not deal with them at all. Many people only use reads that align uniquely to the genome. > When we count the reads for differential analysis > then same read is present in two different genes. This is different than what you mention above. It is possible that: (1) One read aligns to multiple places in the genome. These reads are often called "multimapped" (multimappers, etc.) and as I mentioned above, it is rather common to ignore these and to only count reads that align to a unique position in the genome. (2) It is possible for two different genes to share the same genomic locus as each other, so even though a read maps to one position in the genome, there is more than one gene that it can be assigned to. > So i have a question that > is correct or not? Can you clarify in greater detail what you are asking "correctness" for? > and i am reading genomic features R package for counting > the reads in libraries. Can anyone explain the summarizeOverlaps function? Please read through the copious documentation made available in the GenomicRanges package: http://bioconductor.org/packages/2.12/bioc/html/GenomicRanges.html There are five PDF files available there under the "Documentation" section and all of them are worth your close attention. If you still have more specific questions after reading through those, please ask those specific ones here. A generic question like "explain the summarizeOverlaps" function isn't helpful, as it is explained in multiple places in the documentation -- if there is something specific about it that is confusing, we can help you to address that. > i read manual but what is basic function of it. So what part is unclear? You'll likely also want to read through the vignette for the parathyroidSE package: http://bioconductor.org/packages/release/data/experiment/vignettes/par athyroidSE/inst/doc/parathyroidSE.pdf It shows in great detail how to go from aligned reads to "counted" genes and exons. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago deepika lakhwani ▴ 170

0

Entering edit mode

Hi, On Tue, Sep 24, 2013 at 11:04 AM, deepika lakhwani <lakhwanideepika at="" gmail.com=""> wrote: > Thank You...Steve for answering. > > Ok...I am reading all the pdfs very carefully and I am satisfy with your > answer. > But please clear my some basic question... > which type of reads are selected in differential expression analysis? > I understand that only unique mapped reads use for differential expression > analysis but I think multimapped reads also have an important role > in differential expression analysis because genome has so many > duplicated/paralogous genes. if I am wrong then please tell me. You are not wrong, multimapped reads are obviously important since they are "real" -- ie. they are transcribed from somewhere, and if you could know exactly where it would be a good thing. It is still an open question as to how to use them best. Although I haven't used it myself, RSEM comes up often enough that it seems like many people think its use is a good idea, so you can start there: http://www.biomedcentral.com/1471-2105/12/323 Using the references there as a seed to do a more thorough literature search and finding papers that cite RSEM should be helpful, as well as your "normal" google searching mojo. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech

ADD REPLY • link 11.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thank You Steve... On Wed, Sep 25, 2013 at 12:35 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Tue, Sep 24, 2013 at 11:04 AM, deepika lakhwani > <lakhwanideepika@gmail.com> wrote: > > Thank You...Steve for answering. > > > > Ok...I am reading all the pdfs very carefully and I am satisfy with your > > answer. > > But please clear my some basic question... > > which type of reads are selected in differential expression analysis? > > I understand that only unique mapped reads use for differential > expression > > analysis but I think multimapped reads also have an important role > > in differential expression analysis because genome has so many > > duplicated/paralogous genes. if I am wrong then please tell me. > > You are not wrong, multimapped reads are obviously important since > they are "real" -- ie. they are transcribed from somewhere, and if you > could know exactly where it would be a good thing. > > It is still an open question as to how to use them best. Although I > haven't used it myself, RSEM comes up often enough that it seems like > many people think its use is a good idea, so you can start there: > > http://www.biomedcentral.com/1471-2105/12/323 > > Using the references there as a seed to do a more thorough literature > search and finding papers that cite RSEM should be helpful, as well as > your "normal" google searching mojo. > > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago deepika lakhwani ▴ 170