retrieving upstream/intronic sequences using biomaRt
1
0
Entering edit mode
@shamit-soneji-1677
Last seen 10.2 years ago
Is it possible using biomaRt (or any other R/BioC means) to download the upstream and intron sequences for any given ensembl ID? I know this can be done just using straight biomart, but a facility like this from R would be very useful if one wants to search for TF binding sites. Many thanks Shamit
biomaRt biomaRt • 1.9k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-1780
Last seen 10.2 years ago
Hi Shamit, Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences. Try: library(biomaRt) ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl") getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType = "5utr") Cheers, Steffen Shamit Soneji wrote: > Is it possible using biomaRt (or any other R/BioC means) to download the > upstream and intron sequences for any given ensembl ID? > > I know this can be done just using straight biomart, but a facility like > this from R would be very useful if one wants to search for TF binding > sites. > > Many thanks > > Shamit > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD COMMENT
0
Entering edit mode
Any of you guys know a package that will predict regulatory sites in upstream regions? Regards, Henrik -----Oprindelig meddelelse----- Fra: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- bounces at stat.math.ethz.ch] P? vegne af Steffen Durinck Sendt: Wednesday, September 13, 2006 2:25 PM Til: Shamit Soneji Cc: BioC Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt Hi Shamit, Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences. Try: library(biomaRt) ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl") getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType = "5utr") Cheers, Steffen Shamit Soneji wrote: > Is it possible using biomaRt (or any other R/BioC means) to download > the upstream and intron sequences for any given ensembl ID? > > I know this can be done just using straight biomart, but a facility > like this from R would be very useful if one wants to search for TF > binding sites. > > Many thanks > > Shamit > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Henrik, A package? The more one looks, the more one finds! The attached spreadsheet is very much a work in progress and a bit messy and incomplete. We started it after a very quick review of the literature, so it is also far from comprehensive. However, it will probably give you more than enough information to get started. This review should be helpful: Tompa et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1) 137-144. The three 'old-timer' programs that everyone seems to use are AlignACE, Meme and Consensus. And we have also been using Weeder, Sombrero and NestedMica. Be aware that some of the programs (e.g. AlignACE) can give quite different answers on different runs even with the same parameters. And the different programs can give very different answers. I am aware that a number of people (including ourselves) use several of the programs and take the motifs that turn up in most of the programs for further study. There are also programs that search for known motifs (e.g. MAST (companion to MEME), MSCAN, SiteSeer). Two well-known databases of Transcription Factor Binding Sites are TRANSFAC and JASPAR. Hope this helps. Krys Dr Krystyna A Kelly University of Cambridge Department of Pathology Molteno Building, Tennis Court Road Cambridge CB2 1QP Tel: 01223 333331 Email: kak28 at cam.ac.uk -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Henrik Hornsh?j Jensen Sent: 19 September 2006 10:33 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] retrieving upstream/intronic sequences using biomaRt Any of you guys know a package that will predict regulatory sites in upstream regions? Regards, Henrik -----Oprindelig meddelelse----- Fra: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] P? vegne af Steffen Durinck Sendt: Wednesday, September 13, 2006 2:25 PM Til: Shamit Soneji Cc: BioC Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt Hi Shamit, Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences. Try: library(biomaRt) ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl") getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType = "5utr") Cheers, Steffen Shamit Soneji wrote: > Is it possible using biomaRt (or any other R/BioC means) to download > the upstream and intron sequences for any given ensembl ID? > > I know this can be done just using straight biomart, but a facility > like this from R would be very useful if one wants to search for TF > binding sites. > > Many thanks > > Shamit > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks for your help, although I was thinking of a bioconductor package. Regards, Henrik -----Oprindelig meddelelse----- Fra: Krys Kelly [mailto:kak28 at cam.ac.uk] Sendt: Tuesday, September 19, 2006 12:35 PM Til: Henrik Hornsh?j Jensen; bioconductor at stat.math.ethz.ch Emne: RE: [BioC] retrieving upstream/intronic sequences using biomaRt Hi Henrik, A package? The more one looks, the more one finds! The attached spreadsheet is very much a work in progress and a bit messy and incomplete. We started it after a very quick review of the literature, so it is also far from comprehensive. However, it will probably give you more than enough information to get started. This review should be helpful: Tompa et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1) 137-144. The three 'old-timer' programs that everyone seems to use are AlignACE, Meme and Consensus. And we have also been using Weeder, Sombrero and NestedMica. Be aware that some of the programs (e.g. AlignACE) can give quite different answers on different runs even with the same parameters. And the different programs can give very different answers. I am aware that a number of people (including ourselves) use several of the programs and take the motifs that turn up in most of the programs for further study. There are also programs that search for known motifs (e.g. MAST (companion to MEME), MSCAN, SiteSeer). Two well-known databases of Transcription Factor Binding Sites are TRANSFAC and JASPAR. Hope this helps. Krys Dr Krystyna A Kelly University of Cambridge Department of Pathology Molteno Building, Tennis Court Road Cambridge CB2 1QP Tel: 01223 333331 Email: kak28 at cam.ac.uk -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Henrik Hornsh?j Jensen Sent: 19 September 2006 10:33 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] retrieving upstream/intronic sequences using biomaRt Any of you guys know a package that will predict regulatory sites in upstream regions? Regards, Henrik -----Oprindelig meddelelse----- Fra: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] P? vegne af Steffen Durinck Sendt: Wednesday, September 13, 2006 2:25 PM Til: Shamit Soneji Cc: BioC Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt Hi Shamit, Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences. Try: library(biomaRt) ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl") getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType = "5utr") Cheers, Steffen Shamit Soneji wrote: > Is it possible using biomaRt (or any other R/BioC means) to download > the upstream and intron sequences for any given ensembl ID? > > I know this can be done just using straight biomart, but a facility > like this from R would be very useful if one wants to search for TF > binding sites. > > Many thanks > > Shamit > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
No, I don't think there is a package to find motifs in the current repository. It would be nice to have one. best, Steffen Henrik Hornsh?j Jensen wrote: > Thanks for your help, although I was thinking of a bioconductor package. > > Regards, > Henrik > > > > -----Oprindelig meddelelse----- > Fra: Krys Kelly [mailto:kak28 at cam.ac.uk] > Sendt: Tuesday, September 19, 2006 12:35 PM > Til: Henrik Hornsh?j Jensen; bioconductor at stat.math.ethz.ch > Emne: RE: [BioC] retrieving upstream/intronic sequences using biomaRt > > Hi Henrik, > > A package? The more one looks, the more one finds! The attached spreadsheet is very much a work in progress and a bit messy and incomplete. > We started it after a very quick review of the literature, so it is also far from comprehensive. However, it will probably give you more than enough information to get started. > > This review should be helpful: > > Tompa et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1) 137-144. > > The three 'old-timer' programs that everyone seems to use are AlignACE, Meme and Consensus. And we have also been using Weeder, Sombrero and NestedMica. > Be aware that some of the programs (e.g. AlignACE) can give quite different answers on different runs even with the same parameters. And the different programs can give very different answers. I am aware that a number of people (including ourselves) use several of the programs and take the motifs that turn up in most of the programs for further study. > > There are also programs that search for known motifs (e.g. MAST (companion to MEME), MSCAN, SiteSeer). Two well-known databases of Transcription Factor Binding Sites are TRANSFAC and JASPAR. > > Hope this helps. > > Krys > > > Dr Krystyna A Kelly > University of Cambridge > Department of Pathology > Molteno Building, Tennis Court Road > Cambridge CB2 1QP > Tel: 01223 333331 > Email: kak28 at cam.ac.uk > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Henrik Hornsh?j Jensen > Sent: 19 September 2006 10:33 > To: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] retrieving upstream/intronic sequences using biomaRt > > Any of you guys know a package that will predict regulatory sites in upstream regions? > > Regards, > Henrik > > > > -----Oprindelig meddelelse----- > Fra: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] P? vegne af Steffen Durinck > Sendt: Wednesday, September 13, 2006 2:25 PM > Til: Shamit Soneji > Cc: BioC > Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt > > Hi Shamit, > > Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences. > Try: > > library(biomaRt) > ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl") > getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType = > "5utr") > > Cheers, > Steffen > > > Shamit Soneji wrote: > >> Is it possible using biomaRt (or any other R/BioC means) to download >> the upstream and intron sequences for any given ensembl ID? >> >> I know this can be done just using straight biomart, but a facility >> like this from R would be very useful if one wants to search for TF >> binding sites. >> >> Many thanks >> >> Shamit >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Depending on what you know (start positions etc) - Biostrings is a reasonable tool for extracting these in an efficient manner - a number of genomes are available now, and more can be added Steffen Durinck wrote: > No, I don't think there is a package to find motifs in the current > repository. > It would be nice to have one. > > best, > Steffen > > Henrik Hornsh?j Jensen wrote: >> Thanks for your help, although I was thinking of a bioconductor package. >> >> Regards, >> Henrik >> >> >> >> -----Oprindelig meddelelse----- >> Fra: Krys Kelly [mailto:kak28 at cam.ac.uk] >> Sendt: Tuesday, September 19, 2006 12:35 PM >> Til: Henrik Hornsh?j Jensen; bioconductor at stat.math.ethz.ch >> Emne: RE: [BioC] retrieving upstream/intronic sequences using biomaRt >> >> Hi Henrik, >> >> A package? The more one looks, the more one finds! The attached spreadsheet is very much a work in progress and a bit messy and incomplete. >> We started it after a very quick review of the literature, so it is also far from comprehensive. However, it will probably give you more than enough information to get started. >> >> This review should be helpful: >> >> Tompa et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1) 137-144. >> >> The three 'old-timer' programs that everyone seems to use are AlignACE, Meme and Consensus. And we have also been using Weeder, Sombrero and NestedMica. >> Be aware that some of the programs (e.g. AlignACE) can give quite different answers on different runs even with the same parameters. And the different programs can give very different answers. I am aware that a number of people (including ourselves) use several of the programs and take the motifs that turn up in most of the programs for further study. >> >> There are also programs that search for known motifs (e.g. MAST (companion to MEME), MSCAN, SiteSeer). Two well-known databases of Transcription Factor Binding Sites are TRANSFAC and JASPAR. >> >> Hope this helps. >> >> Krys >> >> >> Dr Krystyna A Kelly >> University of Cambridge >> Department of Pathology >> Molteno Building, Tennis Court Road >> Cambridge CB2 1QP >> Tel: 01223 333331 >> Email: kak28 at cam.ac.uk >> >> >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Henrik Hornsh?j Jensen >> Sent: 19 September 2006 10:33 >> To: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] retrieving upstream/intronic sequences using biomaRt >> >> Any of you guys know a package that will predict regulatory sites in upstream regions? >> >> Regards, >> Henrik >> >> >> >> -----Oprindelig meddelelse----- >> Fra: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] P? vegne af Steffen Durinck >> Sendt: Wednesday, September 13, 2006 2:25 PM >> Til: Shamit Soneji >> Cc: BioC >> Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt >> >> Hi Shamit, >> >> Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences. >> Try: >> >> library(biomaRt) >> ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl") >> getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType = >> "5utr") >> >> Cheers, >> Steffen >> >> >> Shamit Soneji wrote: >> >>> Is it possible using biomaRt (or any other R/BioC means) to download >>> the upstream and intron sequences for any given ensembl ID? >>> >>> I know this can be done just using straight biomart, but a facility >>> like this from R would be very useful if one wants to search for TF >>> binding sites. >>> >>> Many thanks >>> >>> Shamit >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY

Login before adding your answer.

Traffic: 894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6