Extend GRanges start/end
2
1
Entering edit mode
@antonio-miguel-de-jesus-domingues-5182
Last seen 11 months ago
Germany
Hi Bioconducters, I have a GRanges object created with the package GenomicFeatures that contains gene coordinates: ########################################################## hg19RefGenes <- loadFeatures(file='hg19RefGenes.sqlite') hg19RefGenes # generate a GRanges object containing contiguous transcribed regions of the genome (from the 1st to last exon) GeneRegions <- transcriptsBy(hg19RefGenes, by='gene') gene.bounds <- seqapply(GeneRegions, range, progress="text") GeneRegions <- unlist(gene.bounds) head(GeneRegions) GRanges with 6 ranges and 0 elementMetadata cols: seqnames ranges strand <rle> <iranges> <rle> 1 chr19 [ 58858172, 58864865] - 10 chr8 [ 18248755, 18258723] + 100 chr20 [ 43248163, 43280376] - 1000 chr18 [ 25530930, 25757445] - 10000 chr1 [243651535, 244006886] - 100008586 chrX [ 49217771, 49342266] + --- seqlengths: chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 ########################################################## I want to do something that should be simple: extend the ranges to that it also includes 1Kb upstream of the gene (~Promoter region), for example, chr19 [ 58858172, 58864865] becomes chr19 [ 58858172, 58865865]. Looking at the GenomicFeatures fucntions it seems that neither resize or flank will do. Is there another function that I am missing that will do the job ? Cheers, António -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]
Genetics GenomicFeatures Genetics GenomicFeatures • 5.4k views
ADD COMMENT
1
Entering edit mode
Paul Shannon ▴ 750
@paul-shannon-5161
Last seen 10.3 years ago
Hi Antonio, New methods in Genomic Features 1.10.0 promoter getPromoterSeq might help you out here. The latter uses the former, and a documented example can be found here: http://www.bioconductor.org/help/workflows/gene-regulation-tfbs /#compact-summary If this is insufficient please let us know, and we can dig further into your problem. Cheers, - Paul On Oct 29, 2012, at 3:37 AM, Ant?nio Miguel de Jesus Domingues wrote: > Hi Bioconducters, > > I have a GRanges object created with the package GenomicFeatures that > contains gene coordinates: > > ########################################################## > hg19RefGenes <- loadFeatures(file='hg19RefGenes.sqlite') > hg19RefGenes > > # generate a GRanges object containing contiguous transcribed regions of > the genome (from the 1st to last exon) > GeneRegions <- transcriptsBy(hg19RefGenes, by='gene') > gene.bounds <- seqapply(GeneRegions, range, progress="text") > GeneRegions <- unlist(gene.bounds) > head(GeneRegions) > GRanges with 6 ranges and 0 elementMetadata cols: > seqnames ranges strand > <rle> <iranges> <rle> > 1 chr19 [ 58858172, 58864865] - > 10 chr8 [ 18248755, 18258723] + > 100 chr20 [ 43248163, 43280376] - > 1000 chr18 [ 25530930, 25757445] - > 10000 chr1 [243651535, 244006886] - > 100008586 chrX [ 49217771, 49342266] + > --- > seqlengths: > chr1 chr2 ... chr18_gl000207_random > 249250621 243199373 ... 4262 > ########################################################## > > I want to do something that should be simple: extend the ranges to that it > also includes 1Kb upstream of the gene (~Promoter region), for example, > chr19 [ 58858172, 58864865] becomes chr19 [ 58858172, 58865865]. Looking > at the GenomicFeatures fucntions it seems that neither resize or flank will > do. Is there another function that I am missing that will do the job ? > > Cheers, > Ant?nio > > -- > -- > Ant?nio Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue at mpi-cbg.de > tel. +49 351 210 2481 > The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Thank you for your reply Paul. I had only read the vignette and should have read also the reference manual. That said it is still not working. The example: gr <- GRanges("chr1", IRanges(rep(10, 3), width=6), c("+", "-", "*")) gr promoters(gr, 2, 2) Gives and error: Error: could not find function "promoters" I've updated the GenomicRanges to 1.10.0 (sessionInfo bellow). Also from what I've read in the reference manual, "promoter" will return the region around the TSS: "The return object is a GRanges of promoter regions around the transcription start site the span of which is defined by upstream and downstream. Ranges on the * strand are treated the same as those on the + strand.", something like upstream.TSS.downstream (very similar to the function flank), whereas I am interested in extending the range to something like upstream+FullGene. Please correct me if I am wrong. On another note, did the behaviour of transcriptsBy(hg19RefGenes, by='gene') changed in the last update? I thought the output was GRanges object but now it is a GRangesList. Best regards, António On 29 October 2012 13:55, Paul Shannon <pshannon@fhcrc.org> wrote: > Hi Antonio, > > New methods in Genomic Features 1.10.0 > > promoter > getPromoterSeq > > might help you out here. The latter uses the former, and a documented > example can be found here: > > > http://www.bioconductor.org/help/workflows/gene-regulation-tfbs /#compact-summary > > If this is insufficient please let us know, and we can dig further into > your problem. > > Cheers, > > - Paul > > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]
0
Entering edit mode
Hi Antonio, GenomicFeatures::promoters really is there. Truly! Your sessionInfo () results did not come through. Could you try that again? All that said, Michael's suggestion of IRanges::resize may be a better fit to your problem, since you already have the full range of the gene - Paul On Oct 29, 2012, at 7:53 AM, Ant?nio Miguel de Jesus Domingues wrote: > Thank you for your reply Paul. > > I had only read the vignette and should have read also the reference manual. That said it is still not working. The example: > > gr <- GRanges("chr1", IRanges(rep(10, 3), width=6), c("+", "-", "*")) > gr > promoters(gr, 2, 2) > > Gives and error: > Error: could not find function "promoters" > > I've updated the GenomicRanges to 1.10.0 (sessionInfo bellow). Also from what I've read in the reference manual, "promoter" will return the region around the TSS: "The return object is a GRanges of promoter regions around the transcription start site the span of which is de?ned by upstream and downstream. Ranges on the * strand are treated the same as those on the + strand.", something like upstream.TSS.downstream (very similar to the function flank), whereas I am interested in extending the range to something like upstream+FullGene. Please correct me if I am wrong. > > On another note, did the behaviour of transcriptsBy(hg19RefGenes, by='gene') changed in the last update? I thought the output was GRanges object but now it is a GRangesList. > > Best regards, > Ant?nio > > > On 29 October 2012 13:55, Paul Shannon <pshannon at="" fhcrc.org=""> wrote: > Hi Antonio, > > New methods in Genomic Features 1.10.0 > > promoter > getPromoterSeq > > might help you out here. The latter uses the former, and a documented example can be found here: > > http://www.bioconductor.org/help/workflows/gene-regulation-tfbs /#compact-summary > > If this is insufficient please let us know, and we can dig further into your problem. > > Cheers, > > - Paul > > > -- > -- > Ant?nio Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue at mpi-cbg.de > tel. +49 351 210 2481 > The Unbearable Lightness of Molecular Biology
ADD REPLY
0
Entering edit mode
I believe you Paul. I just realized that I needed to re-start R after the update :) Now all I have to do is modify my script with the re-named functions. Thanks a lot for your help. Just one näive question/suggestion/brainstorming: why isn't there a function to extract all the continuously transcribed regions of the genome (= Genes)? Is it because it is something hard to define and thus it is left to individual users to choose which approach to take (making them aware of the caveats)? António On 29 October 2012 16:30, Paul Shannon <pshannon@fhcrc.org> wrote: > Hi Antonio, > > GenomicFeatures::promoters really is there. Truly! Your sessionInfo () > results did not come through. Could you try that again? > > All that said, Michael's suggestion of IRanges::resize may be a better fit > to your problem, since you already have the full range of the gene > > - Paul > > > On Oct 29, 2012, at 7:53 AM, António Miguel de Jesus Domingues wrote: > > > Thank you for your reply Paul. > > > > I had only read the vignette and should have read also the reference > manual. That said it is still not working. The example: > > > > gr <- GRanges("chr1", IRanges(rep(10, 3), width=6), c("+", "-", "*")) > > gr > > promoters(gr, 2, 2) > > > > Gives and error: > > Error: could not find function "promoters" > > > > I've updated the GenomicRanges to 1.10.0 (sessionInfo bellow). Also from > what I've read in the reference manual, "promoter" will return the region > around the TSS: "The return object is a GRanges of promoter regions around > the transcription start site the span of which is defined by upstream and > downstream. Ranges on the * strand are treated the same as those on the + > strand.", something like upstream.TSS.downstream (very similar to the > function flank), whereas I am interested in extending the range to > something like upstream+FullGene. Please correct me if I am wrong. > > > > On another note, did the behaviour of transcriptsBy(hg19RefGenes, > by='gene') changed in the last update? I thought the output was GRanges > object but now it is a GRangesList. > > > > Best regards, > > António > > > > > > On 29 October 2012 13:55, Paul Shannon <pshannon@fhcrc.org> wrote: > > Hi Antonio, > > > > New methods in Genomic Features 1.10.0 > > > > promoter > > getPromoterSeq > > > > might help you out here. The latter uses the former, and a documented > example can be found here: > > > > > http://www.bioconductor.org/help/workflows/gene-regulation-tfbs /#compact-summary > > > > If this is insufficient please let us know, and we can dig further into > your problem. > > > > Cheers, > > > > - Paul > > > > > > -- > > -- > > António Miguel de Jesus Domingues, PhD > > Neugebauer group > > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > > Pfotenhauerstrasse 108 > > 01307 Dresden > > Germany > > > > e-mail: domingue@mpi-cbg.de > > tel. +49 351 210 2481 > > The Unbearable Lightness of Molecular Biology > > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]
0
Entering edit mode
Forget the last paragraph the behaviour of transcriptsBy. My mistake. António On 29 October 2012 15:53, António Miguel de Jesus Domingues < amjdomingues@gmail.com> wrote: > Thank you for your reply Paul. > > I had only read the vignette and should have read also the reference > manual. That said it is still not working. The example: > > gr <- GRanges("chr1", IRanges(rep(10, 3), width=6), c("+", "-", "*")) > gr > promoters(gr, 2, 2) > > Gives and error: > Error: could not find function "promoters" > > I've updated the GenomicRanges to 1.10.0 (sessionInfo bellow). Also from > what I've read in the reference manual, "promoter" will return the region > around the TSS: "The return object is a GRanges of promoter regions around > the transcription start site the span of which is defined by upstream and > downstream. Ranges on the * strand are treated the same as those on the + > strand.", something like upstream.TSS.downstream (very similar to the > function flank), whereas I am interested in extending the range to > something like upstream+FullGene. Please correct me if I am wrong. > > On another note, did the behaviour of transcriptsBy(hg19RefGenes, > by='gene') changed in the last update? I thought the output was GRanges > object but now it is a GRangesList. > > Best regards, > António > > > On 29 October 2012 13:55, Paul Shannon <pshannon@fhcrc.org> wrote: > >> Hi Antonio, >> >> New methods in Genomic Features 1.10.0 >> >> promoter >> getPromoterSeq >> >> might help you out here. The latter uses the former, and a documented >> example can be found here: >> >> >> http://www.bioconductor.org/help/workflows/gene-regulation-tfbs /#compact-summary >> >> If this is insufficient please let us know, and we can dig further into >> your problem. >> >> Cheers, >> >> - Paul >> >> > -- > -- > António Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue@mpi-cbg.de > tel. +49 351 210 2481 > The Unbearable Lightness of Molecular Biology > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States
On Mon, Oct 29, 2012 at 3:37 AM, António Miguel de Jesus Domingues < amjdomingues@gmail.com> wrote: > Hi Bioconducters, > > I have a GRanges object created with the package GenomicFeatures that > contains gene coordinates: > > ########################################################## > hg19RefGenes <- loadFeatures(file='hg19RefGenes.sqlite') > hg19RefGenes > > # generate a GRanges object containing contiguous transcribed regions of > the genome (from the 1st to last exon) > GeneRegions <- transcriptsBy(hg19RefGenes, by='gene') > gene.bounds <- seqapply(GeneRegions, range, progress="text") > You should be able to simply call range() on GeneRegions, i.e., no need to call seqapply. Btw, what is the progress="text" argument here? I'm not aware of this feature. > GeneRegions <- unlist(gene.bounds) > head(GeneRegions) > GRanges with 6 ranges and 0 elementMetadata cols: > seqnames ranges strand > <rle> <iranges> <rle> > 1 chr19 [ 58858172, 58864865] - > 10 chr8 [ 18248755, 18258723] + > 100 chr20 [ 43248163, 43280376] - > 1000 chr18 [ 25530930, 25757445] - > 10000 chr1 [243651535, 244006886] - > 100008586 chrX [ 49217771, 49342266] + > --- > seqlengths: > chr1 chr2 ... chr18_gl000207_random > 249250621 243199373 ... 4262 > ########################################################## > > I want to do something that should be simple: extend the ranges to that it > also includes 1Kb upstream of the gene (~Promoter region), for example, > chr19 [ 58858172, 58864865] becomes chr19 [ 58858172, 58865865]. Looking > at the GenomicFeatures fucntions it seems that neither resize or flank will > do. Is there another function that I am missing that will do the job ? > > resize() should work. Simply call resize(GeneRegions, width(GeneRegions) + 1000L, fix = "end"). Michael > Cheers, > António > > -- > -- > António Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue@mpi-cbg.de > tel. +49 351 210 2481 > The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks for your suggestions Michael, On 29 October 2012 14:35, Michael Lawrence <lawrence.michael@gene.com>wrote: > > On Mon, Oct 29, 2012 at 3:37 AM, António Miguel de Jesus Domingues < > amjdomingues@gmail.com> wrote: > >> Hi Bioconducters, >> >> I have a GRanges object created with the package GenomicFeatures that >> contains gene coordinates: >> >> ########################################################## >> hg19RefGenes <- loadFeatures(file='hg19RefGenes.sqlite') >> hg19RefGenes >> >> # generate a GRanges object containing contiguous transcribed regions of >> the genome (from the 1st to last exon) >> GeneRegions <- transcriptsBy(hg19RefGenes, by='gene') >> gene.bounds <- seqapply(GeneRegions, range, progress="text") >> > > You should be able to simply call range() on GeneRegions, i.e., no need to > call seqapply. Btw, what is the progress="text" argument here? I'm not > aware of this feature. > In this case I actually have a GRangesList and it does not work. I was doing one too many things and posted the result after seqapply/unlist and not the TranscriptbBy. Ah, the progress=text is not a feature as far as I'm aware (and it does not work). I was experimenting with something else and got in the mail somehow. > > >> GeneRegions <- unlist(gene.bounds) >> head(GeneRegions) >> GRanges with 6 ranges and 0 elementMetadata cols: >> seqnames ranges strand >> <rle> <iranges> <rle> >> 1 chr19 [ 58858172, 58864865] - >> 10 chr8 [ 18248755, 18258723] + >> 100 chr20 [ 43248163, 43280376] - >> 1000 chr18 [ 25530930, 25757445] - >> 10000 chr1 [243651535, 244006886] - >> 100008586 chrX [ 49217771, 49342266] + >> --- >> seqlengths: >> chr1 chr2 ... chr18_gl000207_random >> 249250621 243199373 ... 4262 >> ########################################################## >> >> I want to do something that should be simple: extend the ranges to that it >> also includes 1Kb upstream of the gene (~Promoter region), for example, >> chr19 [ 58858172, 58864865] becomes chr19 [ 58858172, 58865865]. Looking >> at the GenomicFeatures fucntions it seems that neither resize or flank >> will >> do. Is there another function that I am missing that will do the job ? >> >> > resize() should work. Simply call resize(GeneRegions, width(GeneRegions) + > 1000L, fix = "end"). > > This tip is great - it works beautifully! Cheers. > Michael > > >> Cheers, >> António >> >> -- >> -- >> António Miguel de Jesus Domingues, PhD >> Neugebauer group >> Max Planck Institute of Molecular Cell Biology and Genetics, Dresden >> Pfotenhauerstrasse 108 >> 01307 Dresden >> Germany >> >> e-mail: domingue@mpi-cbg.de >> tel. +49 351 210 2481 >> The Unbearable Lightness of Molecular Biology >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

Login before adding your answer.

Traffic: 579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6