Entering edit mode
mpg33@drexel.edu
▴
90
@mpg33drexeledu-1897
Last seen 10.2 years ago
I am trying to DNA sequence of the upstream regulatory region of a
number
of genes using the biomaRt package. I start with a list of EntrezGene
IDs
and would like to extract the sequence from 10Kb upstream of the end
of the
5'UTR to the end of the 5'UTR. I wrote a neat little package of
scripts to
do this with biomaRt and export the data in .FASTA format. I have
found
that this works well when I search for one gene at a time. But when I
input a list of entrez gene ids to the getSequence function it gives
me
back sequences but the sequences do not always match the answer I get
when
I search for the gene one at a time. Sending calls to biomaRt one
gene at
a time will clearly be much slower but fast searches do me little good
if I
don't know whether the answer is correct or not.
The codes I have written are pretty straight forward. The meat of the
code
is the getSequence function which I call as follows:
biomart<- useMart('ensembl')
martDataset<-useDataset(dataset,mart=biomart)
getSequence(id = '6720', type='entrezgene',seqType =
'coding_gene_flank',upstream = 10000, mart = martDataset)
As an example, when I search the 5869 gene by itself, I get the same
answer
provided by the biomart web based tool. When I search for the 5869
gene in
a list like the following list, I get a different sequence.
c('8317','1435','1063','4751','3832','3070','5869','675','81624','7249
','1186','3801','672','1058','22974','23654','4171','1062','3148','400
1','3007','26271','9314')
Thanks for any help with this problem. This exact application is
shown in
the biomaRt vignette. Not sure where my application is going wrong.
Let
me know if you need more info. I am open to other ways of solving
this
problem without biomaRt getSequence.
Thanks,
Michael Gormley
[[alternative HTML version deleted]]