boiomaRt 'getSequence' question
2
0
Entering edit mode
@glazko-galina-1653
Last seen 10.3 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080207/ 33e833e8/attachment.pl
• 694 views
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 10.3 years ago
Hi Galina, With biomaRt you can currently only specify either an upstream or downstream flank in one query. So you'll need at least two queries to do this. If you do ?getSequence, the help page will tell you that seqType "gene_exon_intron' gives the exons + introns of a gene. Note that if you retrieve seqType gene_exon_intron, you are already retrieving the 5' and 3' UTRs flanking the coding region. If you also want to include the promotor region in this query you could set upstream=4000. If you need sequences downstream the transcribed region, you'll have to do a second query and match up both query results. Cheers, Steffen ----- Original Message ----- From: "Glazko, Galina" <galina_glazko@urmc.rochester.edu> Date: Thursday, February 7, 2008 12:48 pm Subject: [BioC] boiomaRt 'getSequence' question To: bioconductor at stat.math.ethz.ch > Dear all, > > > > I have a list of ensemble gene IDs and I need to get corresponding > sequences together with 5' upstream (4000 bp), 3'downstream (4000 bp) > and all introns. > > I know that I probably can do this using a combination of commands: > > > > Tmp1<-getSequence(id= > "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank" ,ups > tream=4000,mart=human) > > Tmp2<-getSequence(id= > "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank" ,dow > nstream=4000,mart=human) > > Tmp3<- getSequence(id= > "ENSG00000128714",type="ensembl_gene_id",seqType="cdna", mart=human) > > > > and then concatenate tmp1, tmp2, tmp3, but I am not sure that 'cdna' > seqType will give me introns... > > Also, I hope that there is a simpler way to get all these sequences > using just one command with the right 'seqType' specification. > > > > Could someone please clarify this for me? > > Thank you! > > > > Best regards > > Galina > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 10.3 years ago
Hi Galina, Yes this is possible, however you can only retrieve one sequence at a time this way and you'll need RMySQL installed. Here's how you do this: library(biomaRt) ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl", mysql=TRUE) getSequence(chromosome = 10, start=200000, end = 200010, mart = ensembl) you'll get: chromosome start end sequence 1 10 2e+05 200010 TGTGTTCCCCT Cheers, Steffen ----- Original Message ----- From: "Glazko, Galina" <galina_glazko@urmc.rochester.edu> Date: Thursday, February 7, 2008 4:02 pm Subject: RE: [BioC] boiomaRt 'getSequence' question To: Steffen Durinck <sdurinck at="" lbl.gov=""> > Steffen, > > thank you very much! > But, I also have chromosomal coordinates. > Is it possible instead of gene ID just indicate the coordinates, > chromosome number, and then retrieve the entire sequence? Is there > 'seqType' appropriate for this? > thank you! > > best regrads > Galina > > > ________________________________ > > From: Steffen Durinck [mailto:SDurinck at lbl.gov] > Sent: Thu 2/7/2008 5:57 PM > To: Glazko, Galina > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] boiomaRt 'getSequence' question > > > > Hi Galina, > > With biomaRt you can currently only specify either an upstream or > downstream flank in one query. So you'll need at least two queries > to do this. If you do ?getSequence, the help page will tell you > that seqType "gene_exon_intron' gives the exons + introns of a > gene. Note that if you retrieve seqType gene_exon_intron, you are > already retrieving the 5' and 3' UTRs flanking the coding region. > If you also want to include the promotor region in this query you > could set upstream=4000. If you need sequences downstream the > transcribed region, you'll have to do a second query and match up > both query results. > > Cheers, > Steffen > > ----- Original Message ----- > From: "Glazko, Galina" <galina_glazko at="" urmc.rochester.edu=""> > Date: Thursday, February 7, 2008 12:48 pm > Subject: [BioC] boiomaRt 'getSequence' question > To: bioconductor at stat.math.ethz.ch > > > Dear all, > > > > > > > > I have a list of ensemble gene IDs and I need to get corresponding > > sequences together with 5' upstream (4000 bp), 3'downstream (4000 > bp)> and all introns. > > > > I know that I probably can do this using a combination of commands: > > > > > > > > Tmp1<-getSequence(id= > > > "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank" ,ups> tream=4000,mart=human) > > > > Tmp2<-getSequence(id= > > > "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank" ,dow> nstream=4000,mart=human) > > > > Tmp3<- getSequence(id= > > "ENSG00000128714",type="ensembl_gene_id",seqType="cdna", mart=human) > > > > > > > > and then concatenate tmp1, tmp2, tmp3, but I am not sure that 'cdna' > > seqType will give me introns... > > > > Also, I hope that there is a simpler way to get all these sequences > > using just one command with the right 'seqType' specification. > > > > > > > > Could someone please clarify this for me? > > > > Thank you! > > > > > > > > Best regards > > > > Galina > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >
ADD COMMENT

Login before adding your answer.

Traffic: 430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6