Retreving mRNA subsequence (GenomicFeatures, BSgenome)

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Hi! Problem summary: How to retrieve part of the sequence of mRNA around given location. I have the locations of the binding to mRNA events as GRanges (GRevents) and need to retrieve sequence for motif finding. The problem is that if I use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not transcript sequence, i.e. not accounting for introns or mRNA border. I have tried solving it with exonsBy("transcriptDb object", "tx") function but without success. Question: Is there a bioconductor-supported way of getting resolving the problem? With CLIPseq being more and more popular this will be very demanded function. Thanks, Lukasz -- output of sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 [3] AnnotationDbi_1.18.4 Biobase_2.16.0 [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 [7] Biostrings_2.24.1 GenomicRanges_1.8.3 [9] IRanges_1.14.2 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 [9] XML_3.9-4 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org.

BSgenome BSgenome BSgenome BSgenome • 1.9k views

ADD COMMENT • link updated 11.5 years ago by John linux-user ▴ 210 • written 11.5 years ago by Guest User ★ 13k

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.4 years ago

United States

Use map(gr, exonsByResult) to map your GRanges to transcript coordinates. Then, do any range transformations like flank(). Use GenomicFeatures::extractTranscriptsFromGenome to get the transcript sequences. To extract those ranges from each sequence, use [, with some repping. Should only be about 4 lines of code. Something like this: mapping <- map(motifsGR, exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)) motifsTX <- flank(ranges(mapping), n) tx <- extractTranscriptsFromGenome(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene) motifSeqs <- tx[subjectHits(mapping)][motifsTX[queryHits(mapping)]] Untested... Good luck, Michael On Tue, Oct 15, 2013 at 9:36 AM, Lukasz [guest] <guest@bioconductor.org>wrote: > > Hi! > > Problem summary: How to retrieve part of the sequence of mRNA around given > location. > > I have the locations of the binding to mRNA events as GRanges (GRevents) > and need to retrieve sequence for motif finding. The problem is that if I > use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not > transcript sequence, i.e. not accounting for introns or mRNA border. I have > tried solving it with exonsBy("transcriptDb object", "tx") function but > without success. > > Question: Is there a bioconductor-supported way of getting resolving the > problem? With CLIPseq being more and more popular this will be very > demanded function. > > Thanks, > Lukasz > > -- output of sessionInfo(): > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 > [3] AnnotationDbi_1.18.4 Biobase_2.16.0 > [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 > [7] Biostrings_2.24.1 GenomicRanges_1.8.3 > [9] IRanges_1.14.2 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 > [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 > [9] XML_3.9-4 zlibbioc_1.2.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

It worked, the 'map' function was very helpful! In the end I implemented as you said with modifications: a) reassigned "chrM" is_circular to FALSE for mapping to work (I still use R 2.15) my_exons@unlistData@seqinfo@is_circular <- rep(FALSE, 35) b) reassigned start positions of motifsTX to 1 is below 1: start(motifsTX)[start(motifsTX)<=1] <- 1 c) extracted sequences without ordering by mapping (motifsTX was already ordered) and using subseq function (" '[' subsetting by Ranges is defunct ") motifSeqs <- subseq(tx[subjectHits(mapping)], start=start(motifsTX), end=end(motifsTX)) Thanks a lot for the valuable hint! Lukasz On 15 October 2013 19:00, Michael Lawrence <lawrence.michael@gene.com>wrote: > Use map(gr, exonsByResult) to map your GRanges to transcript coordinates. > Then, do any range transformations like flank(). Use > GenomicFeatures::extractTranscriptsFromGenome to get the transcript > sequences. To extract those ranges from each sequence, use [, with some > repping. Should only be about 4 lines of code. Something like this: > > mapping <- map(motifsGR, exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)) > motifsTX <- flank(ranges(mapping), n) > tx <- extractTranscriptsFromGenome(Hsapiens, > TxDb.Hsapiens.UCSC.hg19.knownGene) > motifSeqs <- tx[subjectHits(mapping)][motifsTX[queryHits(mapping)]] > > Untested... > > Good luck, > Michael > > > On Tue, Oct 15, 2013 at 9:36 AM, Lukasz [guest] <guest@bioconductor.org>wrote: > >> >> Hi! >> >> Problem summary: How to retrieve part of the sequence of mRNA around >> given location. >> >> I have the locations of the binding to mRNA events as GRanges (GRevents) >> and need to retrieve sequence for motif finding. The problem is that if I >> use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not >> transcript sequence, i.e. not accounting for introns or mRNA border. I have >> tried solving it with exonsBy("transcriptDb object", "tx") function but >> without success. >> >> Question: Is there a bioconductor-supported way of getting resolving the >> problem? With CLIPseq being more and more popular this will be very >> demanded function. >> >> Thanks, >> Lukasz >> >> -- output of sessionInfo(): >> >> > sessionInfo() >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 >> [3] AnnotationDbi_1.18.4 Biobase_2.16.0 >> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 >> [7] Biostrings_2.24.1 GenomicRanges_1.8.3 >> [9] IRanges_1.14.2 BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 >> [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 >> [9] XML_3.9-4 zlibbioc_1.2.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Lukasz Kielpinski ▴ 20

0

Entering edit mode

On Wed, Oct 16, 2013 at 1:38 AM, Lukasz Kielpinski <cosiarz@gmail.com>wrote: > It worked, the 'map' function was very helpful! > > In the end I implemented as you said with modifications: > a) reassigned "chrM" is_circular to FALSE for mapping to work (I still use > R 2.15) > > my_exons@unlistData@seqinfo@is_circular <- rep(FALSE, 35) > > Please refrain from direct slot access. Something like isCircular(seqinfo(my_exons)) <- FALSE > b) reassigned start positions of motifsTX to 1 is below 1: > > start(motifsTX)[start(motifsTX)<=1] <- 1 > > or: restrict(motifsTX, 1) > c) extracted sequences without ordering by mapping (motifsTX was already > ordered) and using subseq function (" '[' subsetting by Ranges is defunct ") > > motifSeqs <- subseq(tx[subjectHits(mapping)], start=start(motifsTX), > end=end(motifsTX)) > > Yea, the Ranges support for [ was re-incarnated last devel cycle. You really should update. Thanks a lot for the valuable hint! > > Lukasz > > > > > > On 15 October 2013 19:00, Michael Lawrence <lawrence.michael@gene.com>wrote: > >> Use map(gr, exonsByResult) to map your GRanges to transcript coordinates. >> Then, do any range transformations like flank(). Use >> GenomicFeatures::extractTranscriptsFromGenome to get the transcript >> sequences. To extract those ranges from each sequence, use [, with some >> repping. Should only be about 4 lines of code. Something like this: >> >> mapping <- map(motifsGR, exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)) >> motifsTX <- flank(ranges(mapping), n) >> tx <- extractTranscriptsFromGenome(Hsapiens, >> TxDb.Hsapiens.UCSC.hg19.knownGene) >> motifSeqs <- tx[subjectHits(mapping)][motifsTX[queryHits(mapping)]] >> >> Untested... >> >> Good luck, >> Michael >> >> >> On Tue, Oct 15, 2013 at 9:36 AM, Lukasz [guest] <guest@bioconductor.org>wrote: >> >>> >>> Hi! >>> >>> Problem summary: How to retrieve part of the sequence of mRNA around >>> given location. >>> >>> I have the locations of the binding to mRNA events as GRanges (GRevents) >>> and need to retrieve sequence for motif finding. The problem is that if I >>> use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not >>> transcript sequence, i.e. not accounting for introns or mRNA border. I have >>> tried solving it with exonsBy("transcriptDb object", "tx") function but >>> without success. >>> >>> Question: Is there a bioconductor-supported way of getting resolving the >>> problem? With CLIPseq being more and more popular this will be very >>> demanded function. >>> >>> Thanks, >>> Lukasz >>> >>> -- output of sessionInfo(): >>> >>> > sessionInfo() >>> R version 2.15.0 (2012-03-30) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 >>> [3] AnnotationDbi_1.18.4 Biobase_2.16.0 >>> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 >>> [7] Biostrings_2.24.1 GenomicRanges_1.8.3 >>> [9] IRanges_1.14.2 BiocGenerics_0.2.0 >>> >>> loaded via a namespace (and not attached): >>> [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 >>> [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 >>> [9] XML_3.9-4 zlibbioc_1.2.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Useful tricks, Thanks! On 16 October 2013 15:37, Michael Lawrence <lawrence.michael@gene.com>wrote: > > > > On Wed, Oct 16, 2013 at 1:38 AM, Lukasz Kielpinski <cosiarz@gmail.com>wrote: > >> It worked, the 'map' function was very helpful! >> >> In the end I implemented as you said with modifications: >> a) reassigned "chrM" is_circular to FALSE for mapping to work (I still >> use R 2.15) >> >> my_exons@unlistData@seqinfo@is_circular <- rep(FALSE, 35) >> >> > Please refrain from direct slot access. Something like > isCircular(seqinfo(my_exons)) <- FALSE > > > >> b) reassigned start positions of motifsTX to 1 is below 1: >> >> start(motifsTX)[start(motifsTX)<=1] <- 1 >> >> > or: restrict(motifsTX, 1) > > >> c) extracted sequences without ordering by mapping (motifsTX was already >> ordered) and using subseq function (" '[' subsetting by Ranges is defunct ") >> >> motifSeqs <- subseq(tx[subjectHits(mapping)], start=start(motifsTX), >> end=end(motifsTX)) >> >> > > Yea, the Ranges support for [ was re-incarnated last devel cycle. You > really should update. > > > Thanks a lot for the valuable hint! >> >> Lukasz >> >> >> >> >> >> On 15 October 2013 19:00, Michael Lawrence <lawrence.michael@gene.com>wrote: >> >>> Use map(gr, exonsByResult) to map your GRanges to transcript >>> coordinates. Then, do any range transformations like flank(). Use >>> GenomicFeatures::extractTranscriptsFromGenome to get the transcript >>> sequences. To extract those ranges from each sequence, use [, with some >>> repping. Should only be about 4 lines of code. Something like this: >>> >>> mapping <- map(motifsGR, exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)) >>> motifsTX <- flank(ranges(mapping), n) >>> tx <- extractTranscriptsFromGenome(Hsapiens, >>> TxDb.Hsapiens.UCSC.hg19.knownGene) >>> motifSeqs <- tx[subjectHits(mapping)][motifsTX[queryHits(mapping)]] >>> >>> Untested... >>> >>> Good luck, >>> Michael >>> >>> >>> On Tue, Oct 15, 2013 at 9:36 AM, Lukasz [guest] <guest@bioconductor.org>wrote: >>> >>>> >>>> Hi! >>>> >>>> Problem summary: How to retrieve part of the sequence of mRNA around >>>> given location. >>>> >>>> I have the locations of the binding to mRNA events as GRanges >>>> (GRevents) and need to retrieve sequence for motif finding. The problem is >>>> that if I use getSeq(flank(GRevents, width=n)) then I get the genomic >>>> sequence not transcript sequence, i.e. not accounting for introns or mRNA >>>> border. I have tried solving it with exonsBy("transcriptDb object", "tx") >>>> function but without success. >>>> >>>> Question: Is there a bioconductor-supported way of getting resolving >>>> the problem? With CLIPseq being more and more popular this will be very >>>> demanded function. >>>> >>>> Thanks, >>>> Lukasz >>>> >>>> -- output of sessionInfo(): >>>> >>>> > sessionInfo() >>>> R version 2.15.0 (2012-03-30) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 >>>> [3] AnnotationDbi_1.18.4 Biobase_2.16.0 >>>> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 >>>> [7] Biostrings_2.24.1 GenomicRanges_1.8.3 >>>> [9] IRanges_1.14.2 BiocGenerics_0.2.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 >>>> [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 >>>> [9] XML_3.9-4 zlibbioc_1.2.0 >>>> >>>> >>>> -- >>>> Sent via the guest posting facility at bioconductor.org. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Lukasz Kielpinski ▴ 20

0

Entering edit mode

John linux-user ▴ 210

@john-linux-user-4917

Last seen 9.4 years ago

United States

Hello everyone, I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below obj1 seqnames ranges strand | Val <rle> <iranges> <rle> | <integer> [1] chr1_random [272531, 272571] + | 88 [2] chr1_random [272871, 272911] + | 45 obj2 seqnames ranges strand | Val <rle> <iranges> <rle> | <integer> [1] chr1_random [272531, 272581] + | 800 [2] chr1_random [272850, 272911] + | 450 after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively) mergedObject seqnames ranges strand | object2Val object1Val <rle> <iranges> <rle> | <integer> <integer> [1] chr1_random [272531, 272581] + | 800 88 [2] chr1_random [272850, 272911] + | 450 45 On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest@bioconductor.org> wrote: Hi! Problem summary: How to retrieve part of the sequence of mRNA around given location. I have the locations of the binding to mRNA events as GRanges (GRevents) and need to retrieve sequence for motif finding. The problem is that if I use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not transcript sequence, i.e. not accounting for introns or mRNA border. I have tried solving it with exonsBy("transcriptDb object", "tx") function but without success. Question: Is there a bioconductor-supported way of getting resolving the problem? With CLIPseq being more and more popular this will be very demanded function. Thanks, Lukasz -- output of sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 [3] AnnotationDbi_1.18.4 Biobase_2.16.0 [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 [7] Biostrings_2.24.1 GenomicRanges_1.8.3 [9] IRanges_1.14.2 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 [9] XML_3.9-4 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 11.5 years ago John linux-user ▴ 210

0

Entering edit mode

If you just need to merge the Val from obj2 as a column in obj1, i.e., there is no need to add ranges from obj2 to obj1, then it's simple using match(). If you need the equivalent of merge(all=TRUE), then you need to convert to data frames, use merge() and convert back. There is a merge,VRanges method that does this. Nothing for GRanges yet though. On Wed, Oct 16, 2013 at 7:15 AM, John linux-user <johnlinuxuser@yahoo.com>wrote: > Hello everyone, > > I am wondering how to simply merge two GRanges objects by range field and > add the value by additional vector. For example, I have two objects below > > obj1 > > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272571] + | 88 > [2] chr1_random [272871, 272911] + | 45 > > obj2 > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272581] + | 800 > [2] chr1_random [272850, 272911] + | 450 > > after merged, it should be an object as the following mergedObject and it > would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 > above were different from those of obj1, which were 571 and 871 > respectively) > > mergedObject > > seqnames ranges strand | object2Val > object1Val > <rle> <iranges> <rle> | <integer> > <integer> > [1] chr1_random [272531, 272581] + | 800 88 > [2] chr1_random [272850, 272911] + | 450 45 > > > > > On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] < > guest@bioconductor.org> wrote: > > > Hi! > > Problem summary: How to retrieve part of the sequence of mRNA around given > location. > > I have the locations of the binding to mRNA events as GRanges (GRevents) > and need to retrieve sequence for motif finding. The problem is that if I > use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not > transcript sequence, i.e. not accounting for introns or mRNA border. I have > tried solving it with exonsBy("transcriptDb object", "tx") function but > without success. > > Question: Is there a bioconductor-supported way of getting resolving the > problem? With CLIPseq being more and more popular this will be very > demanded function. > > Thanks, > Lukasz > > -- output of sessionInfo(): > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 > [3] AnnotationDbi_1.18.4 Biobase_2.16.0 > [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 > [7] Biostrings_2.24.1 GenomicRanges_1.8.3 > [9] IRanges_1.14.2 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 > [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 > [9] XML_3.9-4 zlibbioc_1.2.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks for your reply, but both match and merge functions do not solve the problem because the range field might be different for two objects as exampled below. I made a program using Python to solve this problem but hopefully some one in R should write a simple function for this problem. On Wednesday, October 16, 2013 10:23 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: If you just need to merge the Val from obj2 as a column in obj1, i.e., there is no need to add ranges from obj2 to obj1, then it's simple using match(). If you need the equivalent of merge(all=TRUE), then you need to convert to data frames, use merge() and convert back. There is a merge,VRanges method that does this. Nothing for GRanges yet though. On Wed, Oct 16, 2013 at 7:15 AM, John linux-user <johnlinuxuser@yahoo.com> wrote: Hello everyone, > >I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below > >obj1 > >seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272571] + | 88 > [2] chr1_random [272871, 272911] + | 45 > >obj2 > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272581] + | 800 > [2] chr1_random [272850, 272911] + | 450 > >after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively) > >mergedObject > > seqnames ranges strand | object2Val object1Val > <rle> <iranges> <rle> | <integer> <integer> > [1] chr1_random [272531, 272581] + | 800 88 > [2] chr1_random [272850, 272911] + | 450 45 > > > > >On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest@bioconductor.org> wrote: > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago John linux-user ▴ 210

0

Entering edit mode

Actually this might not be that hard. Do a union(x, y) to merge the ranges, then use findOverlaps(x/y, union, select="first") to get a match vector to merge the columns. On Wed, Oct 16, 2013 at 7:39 AM, John linux-user <johnlinuxuser@yahoo.com>wrote: > Thanks for your reply, but both match and merge functions do not solve the > problem because the range field might be different for two objects as > exampled below. I made a program using Python to solve this problem but > hopefully some one in R should write a simple function for this problem. > > > On Wednesday, October 16, 2013 10:23 AM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > If you just need to merge the Val from obj2 as a column in obj1, i.e., > there is no need to add ranges from obj2 to obj1, then it's simple using > match(). If you need the equivalent of merge(all=TRUE), then you need to > convert to data frames, use merge() and convert back. There is a > merge,VRanges method that does this. Nothing for GRanges yet though. > > > On Wed, Oct 16, 2013 at 7:15 AM, John linux-user <johnlinuxuser@yahoo.com>wrote: > > Hello everyone, > > I am wondering how to simply merge two GRanges objects by range field and > add the value by additional vector. For example, I have two objects below > > obj1 > > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272571] + | 88 > [2] chr1_random [272871, 272911] + | 45 > > obj2 > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272581] + | 800 > [2] chr1_random [272850, 272911] + | 450 > > after merged, it should be an object as the following mergedObject and it > would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 > above were different from those of obj1, which were 571 and 871 > respectively) > > mergedObject > > seqnames ranges strand | object2Val > object1Val > <rle> <iranges> <rle> | <integer> > <integer> > [1] chr1_random [272531, 272581] + | 800 88 > [2] chr1_random [272850, 272911] + | 450 45 > > > > > On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] < > guest@bioconductor.org> wrote: > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

but the union seems only pileup the range row and it seemed not concern the a given overlapped range field (e.g 10 bp) as shown in my example above. On Wednesday, October 16, 2013 10:45 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: Actually this might not be that hard. Do a union(x, y) to merge the ranges, then use findOverlaps(x/y, union, select="first") to get a match vector to merge the columns. On Wed, Oct 16, 2013 at 7:39 AM, John linux-user <johnlinuxuser@yahoo.com> wrote: Thanks for your reply, but both match and merge functions do not solve the problem because the range field might be different for two objects as exampled below. I made a program using Python to solve this problem but hopefully some one in R should write a simple function for this problem. > > > >On Wednesday, October 16, 2013 10:23 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > >If you just need to merge the Val from obj2 as a column in obj1, i.e., there is no need to add ranges from obj2 to obj1, then it's simple using match(). If you need the equivalent of merge(all=TRUE), then you need to convert to data frames, use merge() and convert back. There is a merge,VRanges method that does this. Nothing for GRanges yet though. > > > > >On Wed, Oct 16, 2013 at 7:15 AM, John linux-user <johnlinuxuser@yahoo.com> wrote: > >Hello everyone, >> >>I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below >> >>obj1 >> >>seqnames ranges strand | Val >> <rle> <iranges> <rle> | <integer> >> [1] chr1_random [272531, 272571] + | 88 >> [2] chr1_random [272871, 272911] + | 45 >> >>obj2 >> seqnames ranges strand | Val >> <rle> <iranges> <rle> | <integer> >> [1] chr1_random [272531, 272581] + | 800 >> [2] chr1_random [272850, 272911] + | 450 >> >>after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively) >> >>mergedObject >> >> seqnames ranges strand | object2Val object1Val >> <rle> <iranges> <rle> | <integer> <integer> >> [1] chr1_random [272531, 272581] + | 800 88 >> [2] chr1_random [272850, 272911] + | 450 45 >> >> >> >> >>On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest@bioconductor.org> wrote: >> >> [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago John linux-user ▴ 210

0

Entering edit mode

Here is the code: mergeAndAggregate <- function(x, y, mcol.name = "val", FUN = sum) { u <- union(x, y) aggregateViaOverlap <- function(r) { ur <- findOverlaps(u, r) FUN(splitAsList(mcols(r)[[mcol.name]][subjectHits(ur)], factor(queryHits(ur), seq_len(length(u))))) } u$valX <- aggregateViaOverlap(x) u$valY <- aggregateViaOverlap(y) u } x <- GRanges("1", IRanges(c(272531, 272871), c(272571, 272911)), val = c(88, 45)) y <- GRanges("1", IRanges(c(272531, 272850), c(272581, 272911)), val = c(800, 450)) mergeAndAggregate(x, y) GRanges with 2 ranges and 2 metadata columns: seqnames ranges strand | valX valY <rle> <iranges> <rle> | <numeric> <numeric> [1] 1 [272531, 272581] * | 88 800 [2] 1 [272850, 272911] * | 45 450 --- seqlengths: 1 NA Most of the complexity is due to the chance that multiple ranges in x/y could fall into one union range. The function will aggregate using a specified function, by default sum, for example: x <- GRanges("1", IRanges(c(272531, 272871), c(272971, 272911)), val = c(88, 45)) mergeAndAggregate(x, y) GRanges with 1 range and 2 metadata columns: seqnames ranges strand | valX valY <rle> <iranges> <rle> | <numeric> <numeric> [1] 1 [272531, 272971] * | 133 1250 --- seqlengths: 1 NA Hope that helps you stay in R. Michael On Wed, Oct 16, 2013 at 8:03 AM, John linux-user <johnlinuxuser@yahoo.com>wrote: > but the union seems only pileup the range row and it seemed not concern > the a given overlapped range field (e.g 10 bp) as shown in my example above. > > > On Wednesday, October 16, 2013 10:45 AM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > Actually this might not be that hard. > > Do a union(x, y) to merge the ranges, then use findOverlaps(x/y, union, > select="first") to get a match vector to merge the columns. > > > On Wed, Oct 16, 2013 at 7:39 AM, John linux-user <johnlinuxuser@yahoo.com>wrote: > > Thanks for your reply, but both match and merge functions do not solve the > problem because the range field might be different for two objects as > exampled below. I made a program using Python to solve this problem but > hopefully some one in R should write a simple function for this problem. > > > On Wednesday, October 16, 2013 10:23 AM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > If you just need to merge the Val from obj2 as a column in obj1, i.e., > there is no need to add ranges from obj2 to obj1, then it's simple using > match(). If you need the equivalent of merge(all=TRUE), then you need to > convert to data frames, use merge() and convert back. There is a > merge,VRanges method that does this. Nothing for GRanges yet though. > > > On Wed, Oct 16, 2013 at 7:15 AM, John linux-user <johnlinuxuser@yahoo.com>wrote: > > Hello everyone, > > I am wondering how to simply merge two GRanges objects by range field and > add the value by additional vector. For example, I have two objects below > > obj1 > > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272571] + | 88 > [2] chr1_random [272871, 272911] + | 45 > > obj2 > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272581] + | 800 > [2] chr1_random [272850, 272911] + | 450 > > after merged, it should be an object as the following mergedObject and it > would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 > above were different from those of obj1, which were 571 and 871 > respectively) > > mergedObject > > seqnames ranges strand | object2Val > object1Val > <rle> <iranges> <rle> | <integer> > <integer> > [1] chr1_random [272531, 272581] + | 800 88 > [2] chr1_random [272850, 272911] + | 450 45 > > > > > On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] < > guest@bioconductor.org> wrote: > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

It should work. Very smart! You might like to add it to some range packages in R, which might be interesting for some users. Thanks. On Wednesday, October 16, 2013 11:27 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: Here is the code: mergeAndAggregate <- function(x, y, mcol.name = "val", FUN = sum) { u <- union(x, y) aggregateViaOverlap <- function(r) { ur <- findOverlaps(u, r) FUN(splitAsList(mcols(r)[[mcol.name]][subjectHits(ur)], factor(queryHits(ur), seq_len(length(u))))) } u$valX <- aggregateViaOverlap(x) u$valY <- aggregateViaOverlap(y) u } x <- GRanges("1", IRanges(c(272531, 272871), c(272571, 272911)), val = c(88, 45)) y <- GRanges("1", IRanges(c(272531, 272850), c(272581, 272911)), val = c(800, 450)) mergeAndAggregate(x, y) GRanges with 2 ranges and 2 metadata columns: seqnames ranges strand | valX valY <rle> <iranges> <rle> | <numeric> <numeric> [1] 1 [272531, 272581] * | 88 800 [2] 1 [272850, 272911] * | 45 450 --- seqlengths: 1 NA Most of the complexity is due to the chance that multiple ranges in x/y could fall into one union range. The function will aggregate using a specified function, by default sum, for example: x <- GRanges("1", IRanges(c(272531, 272871), c(272971, 272911)), val = c(88, 45)) mergeAndAggregate(x, y) GRanges with 1 range and 2 metadata columns: seqnames ranges strand | valX valY <rle> <iranges> <rle> | <numeric> <numeric> [1] 1 [272531, 272971] * | 133 1250 --- seqlengths: 1 NA Hope that helps you stay in R. Michael On Wed, Oct 16, 2013 at 8:03 AM, John linux-user <johnlinuxuser@yahoo.com> wrote: but the union seems only pileup the range row and it seemed not concern the a given overlapped range field (e.g 10 bp) as shown in my example above. > > > >On Wednesday, October 16, 2013 10:45 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > >Actually this might not be that hard. > >Do a union(x, y) to merge the ranges, then use findOverlaps(x/y, union, select="first") to get a match vector to merge the columns. > > > > >On Wed, Oct 16, 2013 at 7:39 AM, John linux-user <johnlinuxuser@yahoo.com> wrote: > >Thanks for your reply, but both match and merge functions do not solve the problem because the range field might be different for two objects as exampled below. I made a program using Python to solve this problem but hopefully some one in R should write a simple function for this problem. >> >> >> >>On Wednesday, October 16, 2013 10:23 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: >> >>If you just need to merge the Val from obj2 as a column in obj1, i.e., there is no need to add ranges from obj2 to obj1, then it's simple using match(). If you need the equivalent of merge(all=TRUE), then you need to convert to data frames, use merge() and convert back. There is a merge,VRanges method that does this. Nothing for GRanges yet though. >> >> >> >> >>On Wed, Oct 16, 2013 at 7:15 AM, John linux-user <johnlinuxuser@yahoo.com> wrote: >> >>Hello everyone, >>> >>>I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below >>> >>>obj1 >>> >>>seqnames ranges strand | Val >>> <rle> <iranges> <rle> | <integer> >>> [1] chr1_random [272531, 272571] + | 88 >>> [2] chr1_random [272871, 272911] + | 45 >>> >>>obj2 >>> seqnames ranges strand | Val >>> <rle> <iranges> <rle> | <integer> >>> [1] chr1_random [272531, 272581] + | 800 >>> [2] chr1_random [272850, 272911] + | 450 >>> >>>after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively) >>> >>>mergedObject >>> >>> seqnames ranges strand | object2Val object1Val >>> <rle> <iranges> <rle> | <integer> <integer> >>> [1] chr1_random [272531, 272581] + | 800 88 >>> [2] chr1_random [272850, 272911] + | 450 45 >>> >>> >>> >>> >>>On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest@bioconductor.org> wrote: >>> >>> > > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago John linux-user ▴ 210

0

Entering edit mode

Hi John, On 10/16/2013 07:15 AM, John linux-user wrote: > Hello everyone, > > I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below > > obj1 > > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272571] + | 88 > [2] chr1_random [272871, 272911] + | 45 > > obj2 > seqnames ranges strand | Val > <rle> <iranges> <rle> | <integer> > [1] chr1_random [272531, 272581] + | 800 > [2] chr1_random [272850, 272911] + | 450 > > after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively) > > mergedObject > > seqnames ranges strand | object2Val object1Val > <rle> <iranges> <rle> | <integer> <integer> > [1] chr1_random [272531, 272581] + | 800 88 > [2] chr1_random [272850, 272911] + | 450 45 > I fail to see how this result makes sense. If you think of the "val" metadata column as a numerical variable defined along the genome, then, in the merged object, it takes the value 88 over the [272531, 272581] interval but in original object 'obj1' it was taking this value over the [272531, 272571] interval. The following merged object would make much more sense to me: seqnames ranges strand | object2Val object1Val <rle> <iranges> <rle> | <integer> <integer> [1] chr1_random [272531, 272571] + | 800 88 [1] chr1_random [272572, 272581] + | 800 NA [2] chr1_random [272850, 272870] + | 450 NA [2] chr1_random [272871, 272911] + | 450 45 Sounds like you need all the ranges in disjoin(union(obj1, obj2)) if you want to be able to represent the 2 numerical variables accurately. But maybe I'm missing completely what you're trying to achieve. H. > > > > On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest at="" bioconductor.org=""> wrote: > > > Hi! > > Problem summary: How to retrieve part of the sequence of mRNA around given location. > > I have the locations of the binding to mRNA events as GRanges (GRevents) and need to retrieve sequence for motif finding. The problem is that if I use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not transcript sequence, i.e. not accounting for introns or mRNA border. I have tried solving it with exonsBy("transcriptDb object", "tx") function but without success. > > Question: Is there a bioconductor-supported way of getting resolving the problem? With CLIPseq being more and more popular this will be very demanded function. > > Thanks, > Lukasz > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 > [3] AnnotationDbi_1.18.4 Biobase_2.16.0 > [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 > [7] Biostrings_2.24.1 GenomicRanges_1.8.3 > [9] IRanges_1.14.2 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 > [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 > [9] XML_3.9-4 zlibbioc_1.2.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 11.5 years ago Hervé Pagès 16k

0

Entering edit mode

On 10/16/2013 10:54 AM, Hervé Pagès wrote: > Hi John, > > On 10/16/2013 07:15 AM, John linux-user wrote: >> Hello everyone, >> >> I am wondering how to simply merge two GRanges objects by range field >> and add the value by additional vector. For example, I have two >> objects below >> >> obj1 >> >> seqnames ranges strand | Val >> <rle> <iranges> <rle> | <integer> >> [1] chr1_random [272531, 272571] + | 88 >> [2] chr1_random [272871, 272911] + | 45 >> >> obj2 >> seqnames ranges strand | Val >> <rle> <iranges> <rle> | <integer> >> [1] chr1_random [272531, 272581] + | 800 >> [2] chr1_random [272850, 272911] + | 450 >> >> after merged, it should be an object as the following mergedObject and >> it would concern the differences in IRANGE data (e.g. 581 and 850 in >> obj2 above were different from those of obj1, which were 571 and 871 >> respectively) >> >> mergedObject >> >> seqnames ranges strand | >> object2Val object1Val >> <rle> <iranges> <rle> | >> <integer> <integer> >> [1] chr1_random [272531, 272581] + | 800 88 >> [2] chr1_random [272850, 272911] + | 450 45 >> > > I fail to see how this result makes sense. If you think of the "val" > metadata column as a numerical variable defined along the genome, then, > in the merged object, it takes the value 88 over the [272531, 272581] > interval but in original object 'obj1' it was taking this value > over the [272531, 272571] interval. > > The following merged object would make much more sense to me: > > seqnames ranges strand | object2Val object1Val > <rle> <iranges> <rle> | <integer> <integer> > [1] chr1_random [272531, 272571] + | 800 88 > [1] chr1_random [272572, 272581] + | 800 NA > [2] chr1_random [272850, 272870] + | 450 NA > [2] chr1_random [272871, 272911] + | 450 45 > > Sounds like you need all the ranges in disjoin(union(obj1, obj2)) I meant disjoin(c(obj1, obj2)) here, sorry. H. > if you want to be able to represent the 2 numerical variables > accurately. But maybe I'm missing completely what you're trying > to achieve. > > H. > >> >> >> >> On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] >> <guest at="" bioconductor.org=""> wrote: >> >> >> Hi! >> >> Problem summary: How to retrieve part of the sequence of mRNA around >> given location. >> >> I have the locations of the binding to mRNA events as GRanges >> (GRevents) and need to retrieve sequence for motif finding. The >> problem is that if I use getSeq(flank(GRevents, width=n)) then I get >> the genomic sequence not transcript sequence, i.e. not accounting for >> introns or mRNA border. I have tried solving it with >> exonsBy("transcriptDb object", "tx") function but without success. >> >> Question: Is there a bioconductor-supported way of getting resolving >> the problem? With CLIPseq being more and more popular this will be >> very demanded function. >> >> Thanks, >> Lukasz >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1 >> [3] AnnotationDbi_1.18.4 Biobase_2.16.0 >> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0 >> [7] Biostrings_2.24.1 GenomicRanges_1.8.3 >> [9] IRanges_1.14.2 BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1 >> [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0 >> [9] XML_3.9-4 zlibbioc_1.2.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 11.5 years ago Hervé Pagès 16k

Login before adding your answer.