matching sRNA sequences with whole data
1
0
Entering edit mode
chawla ▴ 190
@chawla-4416
Last seen 10.3 years ago
Hi I want to know the faster method of obtaining the frequency of only perfect matches between a data seq and seq target file both are set of nucleotide sequences but in large numbers. I tried for (i in 1:100) #for (i in 1:nrow(urfreq)) { pos1<-which(glr4[,1]==urfreq[i,1]) pos2<-which(glr5[,1]==urfreq[i,1]) pos3<-which(glr6[,1]==urfreq[i,1]) if(length(pos1>0)) { urfreq[i,2]<-length(pos1) } if(length(pos2>0)) { urfreq[i,3]<-length(pos2) } if(length(pos3>0)) { urfreq[i,4]<-length(pos3) } } Since the target datafile is huge , this piece of code take 22 min for only 100 sequences , while I need to find frequency of over 3 million sequences in the three samples data(glr 4 5 and 6). Is there any package/function for such matching. Thanks Konika
• 657 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Konika, The "Biostrings BSgenome Overview" link on this page is a great summary of string matching, http://bioconductor.org/help/course-materials/2011/BioC2011/ Specifically, I think the vmatchPattern() and matchPDict() functions will be most helpful to you. Valerie On 08/08/2011 04:25 AM, chawla wrote: > Hi > I want to know the faster method of obtaining the frequency of only > perfect matches between a data seq and seq target file > both are set of nucleotide sequences but in large numbers. > I tried > for (i in 1:100) > #for (i in 1:nrow(urfreq)) > { > pos1<-which(glr4[,1]==urfreq[i,1]) > pos2<-which(glr5[,1]==urfreq[i,1]) > pos3<-which(glr6[,1]==urfreq[i,1]) > if(length(pos1>0)) > { > urfreq[i,2]<-length(pos1) > } > if(length(pos2>0)) > { > urfreq[i,3]<-length(pos2) > } > if(length(pos3>0)) > { > urfreq[i,4]<-length(pos3) > } > > } > Since the target datafile is huge , this piece of code take 22 min for > only 100 sequences , while I need to find frequency of over 3 million > sequences in the three samples data(glr 4 5 and 6). > Is there any package/function for such matching. > Thanks > Konika > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6