Hello,
This might be a daft question but I'd be interested to know which R package to use to blast query a DNA sequence of around 500bp against the "blastn" database at NCBI. I've got a data frame of antigen sequences in R that instead of copy/pasting each into the blast website, I'd like to query through R.
There are about 5-6 different R packages that offer a blast sequence function of some type, e.g.
blast{BoSSA}
blastSequences{annotate}
blastSeq{hoardeR}
blast(rBLAST)
Then there's the rentrez package hosted by rOpenSci.org that allows searching of Entrez databases, BLAST being one of them.
I've tried them all and for some reason blastSequences{annotate} always seems to time out, blast{BoSSA} needs the input to be a DNAbin class and just returns NAs in the output, blastSeq{hoardeR} requests an email address as input and an XML as output and blast(rBLAST) seems to be an R wrapper for running BLAST+ locally.
What are other people's experiences with running a blast query from R? I imagine this would be quite a common task but there's not much information out there. Honestly, I would just like something akin to the qblast function from the Bio.Blast.NCBIWWW Biopython module.
Many thanks,
Miha
I think annotate is the only one of these packages that is a Bioconductor package? For me
example(blastSequences)
performs several queries that complete without timing out; does it help to increase the timeout argument to something larger? Can you share the specific example of what you are trying to query? As far as I can tell from the blastSequence code, this is entirely on the latency of the blast server.Thanks Martin. For some reason it was constantly timing out yesterday but it seems perfectly fine today. Definitely the best package in my opinion.
I'm trying to use blastSequences but I always get the same error
>example(blastSequences)
blstSq> ## x can be an entrez gene ID
blstSq> blastSequences(17702, timeout=40, as="data.frame")
Error: failed to load external entity "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?QUERY=17702&DATABASE=nr&HITLIST_SIZE=10&FILTER=L&EXPECT=10&PROGRAM=blastn&CMD=Put"
Any idea on what is missing?
Thanks,
Xavier
Are you using a current version of the package? Please post the output of the command
sessionInfo().
NCBI is expecting an https:// url