Entering edit mode
mpg33@drexel.edu
▴
90
@mpg33drexeledu-1897
Last seen 10.3 years ago
I am trying to DNA sequence of the upstream regulatory region of a
number
of genes using the biomaRt package. I start with a list of EntrezGene
IDs
and would like to extract the sequence from 10Kb upstream of the end
of the
5'UTR to the end of the 5'UTR. I wrote a neat little package of
scripts to
do this with biomaRt and export the data in .FASTA format. I have
found
that this works well when I search for one gene at a time. But when I
input a list of entrez gene ids to the getSequence function it gives
me
back sequences but the sequences do not always match the answer I get
when
I search for the gene one at a time. Sending calls to biomaRt one
gene at
a time will clearly be much slower but fast searches do me little good
if I
don't know whether the answer is correct or not.
The codes I have written are pretty straight forward. The meat of the
code
is the getSequence function which I call as follows:
biomart<- useMart('ensembl')
martDataset<-useDataset(
dataset,mart=biomart)
getSequence(id = '6720', type='entrezgene',seqType =
'coding_gene_flank',upstream = 10000, mart = martDataset)
As an example, when I search the 5869 gene by itself, I get the same
answer
provided by the biomart web based tool. When I search for the 5869
gene in
a list like the following list, I get a different sequence.
c('8317','1435','1063','4751','3832','3070','5869','675','81624','7249
','1186','3801','672','1058','22974','23654','4171','1062','3148','400
1','3007','26271','9314')
Thanks for any help with this problem. Let me know if you need more
info.
I am open to other ways of solving this problem without biomaRt
getSequence.
Thanks,
Michael Gormley
[[alternative HTML version deleted]]