Blast analysis of two sequences in R
1
0
Entering edit mode
@prabhakar-ghorpade-6464
Last seen 10.2 years ago
Hello,       I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >90% identity? Sequences >1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >2 CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences.  Thanks. Dr. Ghorpade Prabhakar B. PhD Scholar ( Veterinary Biochemistry), IVRI, Izatnagar, Bareilly, U.P., India [[alternative HTML version deleted]]
Coverage Coverage • 4.2k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States
On 04/21/2014 03:51 AM, prabhakar ghorpade wrote: > Hello, > I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >90% identity? > > Sequences >> 1 > CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >> 2 > CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA The 'annotate' package has 'blastSequences'; I'm not sure that it's useful enough for your purposes. In the 'devel' branch (see http://bioconductor.org/developers/how-to/useDevel/ it has been updated to be more responsive and to return richer data, e.g., df = blastSequences("CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC", timeout=40, as="data.frame") > head(df, 1) Hit_num Hit_id 1 1 gi|380719094|gb|JQ281544.1| Hit_def Hit_accession Hit_len Hsp_num 1 Expression vector pAV-UCSF, complete sequence JQ281544 11534 1 Hsp_bit-score Hsp_score Hsp_evalue Hsp_query-from Hsp_query-to Hsp_hit-from 1 82.4379 90 1.2063e-13 1 45 2126 Hsp_hit-to Hsp_query-frame Hsp_hit-frame Hsp_identity Hsp_positive Hsp_gaps 1 2170 1 1 45 45 0 Hsp_align-len Hsp_qseq 1 45 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC Hsp_hseq 1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC Hsp_midline 1 ||||||||||||||||||||||||||||||||||||||||||||| > Hope that helps; would be happy to hear of other R solutions. Martin > > > Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. > Thanks. > > > Dr. Ghorpade Prabhakar B. > PhD Scholar ( Veterinary Biochemistry), > IVRI, > Izatnagar, > Bareilly, U.P., > India > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi, I tend in cases like this to shirk wrappers and stay closer to the command line usage of tools such as blast. Below is a generic example of run NCBI blastn (part of blast+ package) under R, and post filter the results. The approach should work with minor changes if you have not upgraded to NCBI's blast+. ~Malcolm s<- ## The sequences to be blasted and their fasta 'deflines' c('>s1','CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC' ,'>s2','CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA' ) blast.f6<- ## The fields you want back from blast. c.f. `blastn -help` c('qseqid', 'sseqid', 'pident', 'qcovs') blast.out<- ## The system call to blastn system2('blastn',c('-db',"'nt'" ,'-outfmt',sprintf('"6 %s"',paste(collapse=' ',blast.f6)) , '-perc_identity',"'.90'" ,'-num_threads', 15 # use 'em if you got 'em ! ) ,input=s ,stdout=TRUE ) blast.out.df<- ## parse blast.out as a table and assign names to it `names<-`(read.table(sep='\t',textConnection(b)),blast.f6) # Query the data frame head(blast.out.df[with(b.df,pident>=90 & qcovs>95),],3) qseqid sseqid pident qcovs 1 s1 gi|380719094|gb|JQ281544.1| 100 100 2 s1 gi|161015434|gb|EU143287.2| 100 100 3 s1 gi|161015430|gb|EU143282.2| 100 100 ~ Malcolm >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Martin Morgan >Sent: Wednesday, April 23, 2014 12:12 PM >To: prabhakar ghorpade; bioconductor at r-project.org >Subject: Re: [BioC] Blast analysis of two sequences in R > >On 04/21/2014 03:51 AM, prabhakar ghorpade wrote: >> Hello, >> I have following sequences for which I want to BLAST and see result only for sequences showing > 95% Query coverage and >>90% identity? >> >> Sequences >>> 1 >> CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC >>> 2 >> CTATGTCATCCTCAATCTCTATGAAAGCAACACCGCTACCATAGA > >The 'annotate' package has 'blastSequences'; I'm not sure that it's useful >enough for your purposes. In the 'devel' branch (see > > http://bioconductor.org/developers/how-to/useDevel/ > >it has been updated to be more responsive and to return richer data, e.g., > > df = blastSequences("CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC", > timeout=40, as="data.frame") > > > head(df, 1) > Hit_num Hit_id >1 1 gi|380719094|gb|JQ281544.1| > Hit_def Hit_accession Hit_len Hsp_num >1 Expression vector pAV-UCSF, complete sequence JQ281544 11534 1 > Hsp_bit-score Hsp_score Hsp_evalue Hsp_query-from Hsp_query-to Hsp_hit-from >1 82.4379 90 1.2063e-13 1 45 2126 > Hsp_hit-to Hsp_query-frame Hsp_hit-frame Hsp_identity Hsp_positive Hsp_gaps >1 2170 1 1 45 45 0 > Hsp_align-len Hsp_qseq >1 45 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC > Hsp_hseq >1 CTTTGTCTTCCCCTATGTCATCCTCAATCTCTATGAAAGCAACAC > Hsp_midline >1 ||||||||||||||||||||||||||||||||||||||||||||| > > > >Hope that helps; would be happy to hear of other R solutions. > >Martin > >> >> >> Can you please Suggest how can I select them in R in NCBI BLAST so that I get sequences showing > 95% Query coverage and >90% >identity. Is there program in R to select them? I want to detect number of organism showing uniques results for given sequences. >> Thanks. >> >> >> Dr. Ghorpade Prabhakar B. >> PhD Scholar ( Veterinary Biochemistry), >> IVRI, >> Izatnagar, >> Bareilly, U.P., >> India >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >-- >Computational Biology / Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N. >PO Box 19024 Seattle, WA 98109 > >Location: Arnold Building M1 B861 >Phone: (206) 667-2793 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode

Hi Malcolm,

 

Would you mind please give me a reference where I could learn the NCBI query in R? I'm really interested to learn but unfortunately couldn't find the good reference yet.

 

Thanks a lot and looking forward.

Maah

ADD REPLY

Login before adding your answer.

Traffic: 503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6