Obtaining exon structure of a gene via Bioconductor
2
0
Entering edit mode
@ruppert-valentino-1376
Last seen 10.2 years ago
Hello, I want to do heteroduplex on each exon of around 50 genes. Getting the exon structure for each gene from Ensembl and manually identifying the exon sequence seems very laborous. Is there a way using Bioconductor package to get the exon sequences for all the transcripts of a gene, if so how can I do this, would biomaRt do it, if so how? Anyway examples of a script or ideas is greatly appreciated as it takes hours to get all the exon sequences for a gene split up into files to use for PCR. thanks in advance for any help on this. Raphael _________________________________________________________________ Tell us your greatest, weirdest and funniest Hotmail stories [[alternative HTML version deleted]]
• 1.4k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi, On Tue, Feb 2, 2010 at 11:08 AM, Ruppert Valentino <ruppert7 at="" hotmail.com=""> wrote: > Hello, > > I want to do heteroduplex on each exon of around 50 genes. Getting the exon structure for each gene from Ensembl and manually identifying the exon sequence seems very laborous. > > Is there a way using Bioconductor package to get the exon sequences for all the transcripts of a gene, if so how can I do this, would biomaRt do it, if so how? > > Anyway examples of a script or ideas is greatly appreciated as it takes hours to get all the exon sequences for a gene split up into files to use for PCR. > > thanks in advance for any help on this. I'm not sure that it really takes hours to get the exon structure ... I've actually been developing and using a package to do this: http://wiki.github.com/lianos/GenomeAnnotations I'm not necessarily recommending that you use this package, but I outlined the steps you could take to download the refseq gene annotations for mm9, here: http://wiki.github.com/lianos/GenomeAnnotations/installing-annotation- packages In the "Downloading the Gene Annotation File" section. You'll get a tab delimited file. 1 line per transcript. There are exonStart and exonEnd columns that are comma separated list of numbers that have the information you're looking for. If you only want a few genes, then parsing that file shouldn't be too bad ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
@michael-dondrup-3849
Last seen 10.2 years ago
Hi, this is also possible with biomart and therefore also with biomaRt. The following query gives an example. Fetches all exon sequences for C. elegans Gene with ensembl-geneid T24D1.1 in fasta format. (try this url) http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&ATTR IBUTES=celegans_gene_ensembl.default.sequences.ensembl_gene_id|celegan s_gene_ensembl.default.sequences.ensembl_transcript_id|celegans_gene_e nsembl.default.sequences.gene_exon&FILTERS=celegans_gene_ensembl.defau lt.filters.ensembl_gene_id."T24D1.1"|celegans_gene_ensembl.default.fil ters.biotype."protein_coding"&VISIBLEPANEL=resultspanel If you like this, parameters can be almost directly translated into the the corresponding query in biomaRt although I don't think this is necessary for this case. Best Michael Am Feb 2, 2010 um 5:08 PM schrieb Ruppert Valentino: > > > > Hello, > > > > I want to do heteroduplex on each exon of around 50 genes. Getting the exon structure for each gene from Ensembl and manually identifying the exon sequence seems very laborous. > > > > Is there a way using Bioconductor package to get the exon sequences for all the transcripts of a gene, if so how can I do this, would biomaRt do it, if so how? > > > > Anyway examples of a script or ideas is greatly appreciated as it takes hours to get all the exon sequences for a gene split up into files to use for PCR. > > > > thanks in advance for any help on this. > > > > Raphael > > _________________________________________________________________ > Tell us your greatest, weirdest and funniest Hotmail stories > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Thanks Michael This looks great. I wonder if you could direct me to a page that explains the database schema that ensembl uses as I am interested in human genes and not sure what to put in the query to get say human TP53 gene exonic sequences? thanks > Subject: Re: [BioC] Obtaining exon structure of a gene via Bioconductor > From: Michael.Dondrup@uni.no > Date: Tue, 2 Feb 2010 17:41:39 +0100 > CC: bioconductor@stat.math.ethz.ch > To: ruppert7@hotmail.com > > Hi, > this is also possible with biomart and therefore also with biomaRt. > The following query gives an example. Fetches all exon sequences for > C. elegans Gene with ensembl-geneid T24D1.1 in fasta format. > (try this url) > > http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&AT TRIBUTES=celegans_gene_ensembl.default.sequences.ensembl_gene_id|celeg ans_gene_ensembl.default.sequences.ensembl_transcript_id|celegans_gene _ensembl.default.sequences.gene_exon&FILTERS=celegans_gene_ensembl.def ault.filters.ensembl_gene_id."T24D1.1"|celegans_gene_ensembl.default.f ilters.biotype."protein_coding"&VISIBLEPANEL=resultspanel > > If you like this, parameters can be almost directly translated into the the corresponding query in biomaRt although I don't think this is necessary for this case. > > Best > Michael > > Am Feb 2, 2010 um 5:08 PM schrieb Ruppert Valentino: > > > > > > > > > Hello, > > > > > > > > I want to do heteroduplex on each exon of around 50 genes. Getting the exon structure for each gene from Ensembl and manually identifying the exon sequence seems very laborous. > > > > > > > > Is there a way using Bioconductor package to get the exon sequences for all the transcripts of a gene, if so how can I do this, would biomaRt do it, if so how? > > > > > > > > Anyway examples of a script or ideas is greatly appreciated as it takes hours to get all the exon sequences for a gene split up into files to use for PCR. > > > > > > > > thanks in advance for any help on this. > > > > > > > > Raphael > > > > _________________________________________________________________ > > Tell us your greatest, weirdest and funniest Hotmail stories > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _________________________________________________________________ Got a cool Hotmail story? Tell us now [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Ruppert, Ruppert Valentino wrote: > Thanks Michael > > This looks great. I wonder if you could direct me to a page that > explains the database schema that ensembl uses as I am interested in > human genes and not sure what to put in the query to get say human > TP53 gene exonic sequences? You don't need to know anything about the database schema to query Biomart. You can just go to the martview page http://www.biomart.org/biomart/martview/ and then go through the GUI and make your selections. I assume Michael did that and then clicked on the 'URL' button to get the URI that he sent you. Alternatively, and probably easier in the long run is to use biomaRt. Your query is quite simple: library(biomaRt) mart <- useMart("ensembl","hsapiens_gene_ensembl") seqs <- getBM("gene_exon","hgnc_symbol","TP53", mart) You can also add other things like the Ensembl transcript ID to the output by simply appending to the first argument (the attributes argument) like thus: seqs <- getBM(c("ensembl_transcript_id", "gene_exon"), "hgnc_symbol", "TP53", mart) You can also do multiple gene symbols at one time as well. If you need to do many genes, do them all at once and parse the resulting data.frame. In that case you are advised to add hgnc_symbol to the attributes as well, as the returned data are not necessarily sorted in the way you might expect. Best, Jim > > thanks > > > >> Subject: Re: [BioC] Obtaining exon structure of a gene via >> Bioconductor From: Michael.Dondrup at uni.no Date: Tue, 2 Feb 2010 >> 17:41:39 +0100 CC: bioconductor at stat.math.ethz.ch To: >> ruppert7 at hotmail.com >> >> Hi, this is also possible with biomart and therefore also with >> biomaRt. The following query gives an example. Fetches all exon >> sequences for C. elegans Gene with ensembl-geneid T24D1.1 in fasta >> format. (try this url) >> >> http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&A TTRIBUTES=celegans_gene_ensembl.default.sequences.ensembl_gene_id|cele gans_gene_ensembl.default.sequences.ensembl_transcript_id|celegans_gen e_ensembl.default.sequences.gene_exon&FILTERS=celegans_gene_ensembl.de fault.filters.ensembl_gene_id."T24D1.1"|celegans_gene_ensembl.default. filters.biotype."protein_coding"&VISIBLEPANEL=resultspanel >> >> >> If you like this, parameters can be almost directly translated into >> the the corresponding query in biomaRt although I don't think this >> is necessary for this case. >> >> Best Michael >> >> Am Feb 2, 2010 um 5:08 PM schrieb Ruppert Valentino: >> >>> >>> >>> Hello, >>> >>> >>> >>> I want to do heteroduplex on each exon of around 50 genes. >>> Getting the exon structure for each gene from Ensembl and >>> manually identifying the exon sequence seems very laborous. >>> >>> >>> >>> Is there a way using Bioconductor package to get the exon >>> sequences for all the transcripts of a gene, if so how can I do >>> this, would biomaRt do it, if so how? >>> >>> >>> >>> Anyway examples of a script or ideas is greatly appreciated as it >>> takes hours to get all the exon sequences for a gene split up >>> into files to use for PCR. >>> >>> >>> >>> thanks in advance for any help on this. >>> >>> >>> >>> Raphael >>> >>> _________________________________________________________________ >>> Tell us your greatest, weirdest and funniest Hotmail stories >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ Bioconductor >>> mailing list Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>> archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > _________________________________________________________________ > Got a cool Hotmail story? Tell us now > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
Hi James and Ruppert, yes, I clicked on the URL button. I actually like the idea of trying out queries in the web interface first, because I find it quite intuitive as you can see the available parameters that can differ between several Databases at a glance. I am sorry, I should maybe have explained that a bit better. It just came to my mind that it would be great, if Biomart in addition to a Perl, XML and URL button also had an R button that will provide the query in terms of biomaRt. Maybe this could be done by conversion from the XML output? Just an idea. Michael Am Feb 2, 2010 um 8:13 PM schrieb James W. MacDonald: > Hi Ruppert, > > Ruppert Valentino wrote: >> Thanks Michael >> This looks great. I wonder if you could direct me to a page that >> explains the database schema that ensembl uses as I am interested in >> human genes and not sure what to put in the query to get say human >> TP53 gene exonic sequences? > > You don't need to know anything about the database schema to query Biomart. You can just go to the martview page > > http://www.biomart.org/biomart/martview/ > > and then go through the GUI and make your selections. I assume Michael did that and then clicked on the 'URL' button to get the URI that he sent you. > > Alternatively, and probably easier in the long run is to use biomaRt. Your query is quite simple: > > library(biomaRt) > mart <- useMart("ensembl","hsapiens_gene_ensembl") > seqs <- getBM("gene_exon","hgnc_symbol","TP53", mart) > > You can also add other things like the Ensembl transcript ID to the output by simply appending to the first argument (the attributes argument) like thus: > > seqs <- getBM(c("ensembl_transcript_id", "gene_exon"), "hgnc_symbol", "TP53", mart) > > You can also do multiple gene symbols at one time as well. If you need to do many genes, do them all at once and parse the resulting data.frame. In that case you are advised to add hgnc_symbol to the attributes as well, as the returned data are not necessarily sorted in the way you might expect. > > Best, > > Jim > > >> thanks >>> Subject: Re: [BioC] Obtaining exon structure of a gene via >>> Bioconductor From: Michael.Dondrup at uni.no Date: Tue, 2 Feb 2010 >>> 17:41:39 +0100 CC: bioconductor at stat.math.ethz.ch To: >>> ruppert7 at hotmail.com >>> Hi, this is also possible with biomart and therefore also with >>> biomaRt. The following query gives an example. Fetches all exon >>> sequences for C. elegans Gene with ensembl-geneid T24D1.1 in fasta >>> format. (try this url) >>> http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default& ATTRIBUTES=celegans_gene_ensembl.default.sequences.ensembl_gene_id|cel egans_gene_ensembl.default.sequences.ensembl_transcript_id|celegans_ge ne_ensembl.default.sequences.gene_exon&FILTERS=celegans_gene_ensembl.d efault.filters.ensembl_gene_id."T24D1.1"|celegans_gene_ensembl.default .filters.biotype."protein_coding"&VISIBLEPANEL=resultspanel >>> If you like this, parameters can be almost directly translated into >>> the the corresponding query in biomaRt although I don't think this >>> is necessary for this case. >>> Best Michael >>> Am Feb 2, 2010 um 5:08 PM schrieb Ruppert Valentino: >>>> Hello, >>>> I want to do heteroduplex on each exon of around 50 genes. >>>> Getting the exon structure for each gene from Ensembl and >>>> manually identifying the exon sequence seems very laborous. >>>> Is there a way using Bioconductor package to get the exon >>>> sequences for all the transcripts of a gene, if so how can I do >>>> this, would biomaRt do it, if so how? >>>> Anyway examples of a script or ideas is greatly appreciated as it >>>> takes hours to get all the exon sequences for a gene split up >>>> into files to use for PCR. >>>> thanks in advance for any help on this. >>>> Raphael >>>> _________________________________________________________________ >>>> Tell us your greatest, weirdest and funniest Hotmail stories >>>> [[alternative HTML version deleted]] >>>> _______________________________________________ Bioconductor >>>> mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>>> archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> _________________________________________________________________ Got a cool Hotmail story? Tell us now >> [[alternative HTML version deleted]] >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY

Login before adding your answer.

Traffic: 959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6