a question about trimLRPatterns?

0

Entering edit mode

wang peter ★ 2.0k

@wang-peter-4647

Last seen 10.6 years ago

hello all: i want to know how this function process data? for left match it is taken as a "rate" and is converted to max.Lmismatch=as.integer(1:nLp *rate ) then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp) of Lpattern and the first i letters of subject. dees i start from 1 or nLp? and the corresponding allowed mismatch is max.Lmismatch[i]? for the right match it is taken as a "rate" and is converted to max.Rmismatch=as.integer(1:nRp * rate) then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp) of subject and the first i letters of Rpattern. dees i start from 1 or nRp? and the corresponding allowed mismatch is max.Rmismatch[i]? -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

PROcess PROcess • 1.0k views

ADD COMMENT • link updated 13.2 years ago by Harris A. Jaffee ▴ 590 • written 13.2 years ago by wang peter ★ 2.0k

0

Entering edit mode

Harris A. Jaffee ▴ 590

@harris-a-jaffee-3972

Last seen 10.5 years ago

United States

To quote from ?trimLRPatterns, for Lpattern here, Once the integer vector is constructed using the rules given above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is the number of acceptable mismatches (errors) between the suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern' and the first 'i' letters of 'subject'. When 'with.Lindels' is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit distance" between that suffix of 'Lpattern' and 'subject', starting at position '1' of 'subject' (as in 'matchPattern' and 'isMatchingStartingAt'). For a given element 's' of the 'subject', the initial segment (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the largest 'i' for which there is an acceptable match, if any. If you are asking about implementation, the sub-patterns, i.e suffixes of Lpattern or prefixes of Rpattern, are tested "longest first" using the the relevant max.mismatch vector "from the top, down". (Intuitively, you should think of your max.mismatch vectors as being monotone increasing, perhaps not strictly.) The testing process at the relevant side of the subject stops if/when an acceptable match is seen. The See Also refers to ?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and which.isMatchingEndingAt(). These functions are called with auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at" value to be passed in the context of a *vector* "max.mismatch" value, the actual pattern being tested getting iteratively shorter by 1 character as necessary, for each element of the subject, automatically. Let me know if I didn't get at your question. On Jan 19, 2012, at 3:15 PM, wang peter wrote: > hello all: > > i want to know how this function process data? > > for left match > it is taken as a "rate" and is converted to > max.Lmismatch=as.integer(1:nLp *rate ) > then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp) > of Lpattern and the first i letters of subject. > dees i start from 1 or nLp? and the corresponding allowed mismatch is > max.Lmismatch[i]? > > for the right match > it is taken as a "rate" and is converted to > max.Rmismatch=as.integer(1:nRp * rate) > then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp) > of subject and the first i letters of Rpattern. > dees i start from 1 or nRp? and the corresponding allowed mismatch is > max.Rmismatch[i]? > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.2 years ago Harris A. Jaffee ▴ 590

0

Entering edit mode

On Jan 19, 2012, at 4:20 PM, Harris A. Jaffee wrote: > To quote from ?trimLRPatterns, for Lpattern here, > > Once the integer vector is constructed using the rules given > above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is > the number of acceptable mismatches (errors) between the > suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern' > and the first 'i' letters of 'subject'. When 'with.Lindels' > is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit > distance" between that suffix of 'Lpattern' and 'subject', > starting at position '1' of 'subject' (as in 'matchPattern' > and 'isMatchingStartingAt'). > > For a given element 's' of the 'subject', the initial segment > (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the > largest 'i' for which there is an acceptable match, if any. > > If you are asking about implementation, the sub-patterns, i.e suffixes of > Lpattern or prefixes of Rpattern, are tested "longest first" using the > the relevant max.mismatch vector "from the top, down". (Intuitively, you > should think of your max.mismatch vectors as being monotone increasing, > perhaps not strictly.) The testing process at the relevant side of the > subject stops if/when an acceptable match is seen. The See Also refers to > ?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and > which.isMatchingEndingAt(). These functions are called with > auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at" > value to be passed in the context of a *vector* "max.mismatch" value, the > actual pattern being tested getting iteratively shorter by 1 character as > necessary, for each element of the subject, automatically. To clarify, in the C code, there are two loops. There is an outside loop over the subject, and then for each subject element, the specified single pattern is iteratively "auto-reduced" as necessary. > Let me know if I didn't get at your question. > > On Jan 19, 2012, at 3:15 PM, wang peter wrote: > >> hello all: >> >> i want to know how this function process data? >> >> for left match >> it is taken as a "rate" and is converted to >> max.Lmismatch=as.integer(1:nLp *rate ) >> then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp) >> of Lpattern and the first i letters of subject. >> dees i start from 1 or nLp? and the corresponding allowed mismatch is >> max.Lmismatch[i]? >> >> for the right match >> it is taken as a "rate" and is converted to >> max.Rmismatch=as.integer(1:nRp * rate) >> then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp) >> of subject and the first i letters of Rpattern. >> dees i start from 1 or nRp? and the corresponding allowed mismatch is >> max.Rmismatch[i]? >> >> -- >> shan gao >> Room 231(Dr.Fei lab) >> Boyce Thompson Institute >> Cornell University >> Tower Road, Ithaca, NY 14853-1801 >> Office phone: 1-607-254-1267(day) >> Official email:sg839 at cornell.edu >> Facebook:http://www.facebook.com/profile.php?id=100001986532253 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.2 years ago Harris A. Jaffee ▴ 590

Login before adding your answer.