a question about trimLRPatterns?
0
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 10.1 years ago
United States
Just getting to my mail after the power outage in New Jersey. I'm laying claim to the sapply quoted here, verbatim as far as I can tell, sent off-list (my bad) about a year ago in order to offer an exploratory approach to the setting of max.Rmismatch. The conclusion would be, for this subject sequence and for the first Rpattern here, that 0 is a good value, and in the second case, as Herv? has said, that 2 is good when 1 was not enough. trimLRPatterns does not actually use any nedit function nor an sapply, although it does "stop" (at the C level) at the first position satisfying max.Rmismatch, if any, which of course can vary over the subject space. On Oct 30, 2012, at 12:58 PM, wang peter wrote: > i want to know how this function works? > > for example: > trimLRPatterns(Rpattern = Rpattern, subject = subject, > max.Rmismatch=1,with.Lindels=TRUE) > > > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" > > the function will try to calculate the distance by such coding: > > sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) { > s = substr(subject, j, nchar(subject)) > p = substr(Rpattern, 1, nchar(subject)-j+1) > neditEndingAtending.at=nchar(s), pattern = p, subject = s, > with.indels=TRUE) > }) > [1] 0 2 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 > 6 6 5 4 4 4 3 2 1 0 > [33] 1 1 > when the function find the value which is first satisfy the > max.Rmismatch value, it will stop > in this case,they function will stop at the first position. > > IF > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" > The results > [1] 2 3 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 > 6 6 5 4 4 4 3 2 1 0 > [33] 1 1 > it will stop > in this case,they function will stop at > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = > "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" > > > so the shortcoming is the trimLRPatterns cannot find the shared > sequence between subject and Rpattern > "GAATAGTACTGTAGGCACCATCAATAGATCGG" > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
• 681 views
ADD COMMENT

Login before adding your answer.

Traffic: 1026 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6