How do I use biomaRt to get upstreamFlank Genomic Sequence for many Genomes?
0
0
Entering edit mode
Noah Dowell ▴ 410
@noah-dowell-3791
Last seen 10.3 years ago
Hello All, Problem: I would like to obtain the genomic sequence that is upstream (~500 bp) of a specific bacterial gene. I want to get this sequence for all bacteria genomes that have the gene. On EcoCyc I see that many (> 100) bacteria have the gene but I do not know how to get all of the sequence in a high-throughput manner so I was going to use biomaRt to get the sequence and send to alignment programs later. I have read through the vignette and tried to get the function to work with a non- ensembl MART to no avail. I also was presented with an error (see below) that suggested I report to the mailing list. It looks like I will also have to query each of the 249 bacterial genomes in the "bacterial_mart_7" Mart individually (with getLDS or getBM) which does not seem high-throughput at all... are there any other suggestions that will allow me to take advantage a the large amount of bacterial genomic data for homology studies? Thank you for your help. Noah Attempted Solution (for a single genome): > bacGenome = useMart("bacterial_mart_7", dataset = "esc_20_gene") Checking attributes ... ok Checking filters ... ok > > filters = c("external_gene_id") > > attributes = c("external_gene_id","upstream_flank") > > values = list(external_gene_id = c("fis"), 500) > seq = getBM(attributes=attributes, filters = filters, values = values, mart= bacGenome, + checkFilters= FALSE) V1 1 fis Error in getBM(attributes = attributes, filters = filters, values = values, : The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query. Please report this to the mailing list. > sessionInfo() R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] Biobase_2.8.0 Biostrings_2.16.0 BSgenome_1.16.0 GenomicRanges_1.0.1 IRanges_1.6.0 [6] tools_2.11.0 XML_2.8-1
Alignment biomaRt genomes Alignment biomaRt genomes • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6