a question about trimLRPatterns?
2
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 10.2 years ago
i want to know how this function works? for example: trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=1,with.Lindels=TRUE) subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" the function will try to calculate the distance by such coding: sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) { s = substr(subject, j, nchar(subject)) p = substr(Rpattern, 1, nchar(subject)-j+1) neditEndingAtending.at=nchar(s), pattern = p, subject = s, with.indels=TRUE) }) [1] 0 2 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 6 6 5 4 4 4 3 2 1 0 [33] 1 1 when the function find the value which is first satisfy the max.Rmismatch value, it will stop in this case,they function will stop at the first position. IF subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" The results [1] 2 3 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 6 6 5 4 4 4 3 2 1 0 [33] 1 1 it will stop in this case,they function will stop at subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" so the shortcoming is the trimLRPatterns cannot find the shared sequence between subject and Rpattern "GAATAGTACTGTAGGCACCATCAATAGATCGG" -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
• 1.1k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States
Hi there, On 10/30/2012 09:58 AM, wang peter wrote: > i want to know how this function works? > > for example: > trimLRPatterns(Rpattern = Rpattern, subject = subject, > max.Rmismatch=1,with.Lindels=TRUE) > > > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" > > the function will try to calculate the distance by such coding: > > sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) { > s = substr(subject, j, nchar(subject)) > p = substr(Rpattern, 1, nchar(subject)-j+1) > neditEndingAtending.at=nchar(s), pattern = p, subject = s, > with.indels=TRUE) > }) > [1] 0 2 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 > 6 6 5 4 4 4 3 2 1 0 > [33] 1 1 > when the function find the value which is first satisfy the > max.Rmismatch value, it will stop > in this case,they function will stop at the first position. > > IF > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" > The results > [1] 2 3 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 > 6 6 5 4 4 4 3 2 1 0 > [33] 1 1 > it will stop > in this case,they function will stop at > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = > "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" > > > so the shortcoming is the trimLRPatterns cannot find the shared > sequence between subject and Rpattern > "GAATAGTACTGTAGGCACCATCAATAGATCGG" trimLRPatterns is about trimming the subject by finding/removing the largest possible *prefix* and/or *suffix* in the subject that looks like the left and right pattern, respectively. It's not a tool for finding/removing the longest common substring between the subject and pattern. Note that, in your case, you would get the result I believe you're looking for by just using max.Rmismatch=2 instead of max.Rmismatch=1. Cheers, H. > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
yeah i see but for other case subject = "TATAGTAGATATTGGAATNNNNGCACCATCAATAGATCGGAA" Rpattern = "NNNN" trimLRPatterns CANNOT DO IT THX
ADD REPLY
0
Entering edit mode
Hi, On Tue, Oct 30, 2012 at 3:20 PM, wang peter <wng.peter at="" gmail.com=""> wrote: > yeah > i see > but for other case > > > subject = "TATAGTAGATATTGGAATNNNNGCACCATCAATAGATCGGAA" > Rpattern = "NNNN" > > trimLRPatterns CANNOT DO IT Note that the title of the help page for `?trimLRpatterns` says: "Trim Flanking Patterns from Sequences" The NNNN pattern you are trying to "trim" is about as about as far away from the definition of "flanking" as one can get ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi, On Tue, Oct 30, 2012 at 4:11 PM, wang peter <wng.peter at="" gmail.com=""> wrote: > thx > i must use another function to do glocal match > and trim them > > rigth? Sounds about right to me. Perhaps you're looking for `matchPattern` or `pairwiseAlignment`? Out of curiosity, are you pre-processing NGS data? If so, what type of reads have artifacts in the center that you want to remove but still "trust" the flanking sequence? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
On 10/30/2012 01:21 PM, Steve Lianoglou wrote: > Hi, > > On Tue, Oct 30, 2012 at 4:11 PM, wang peter <wng.peter at="" gmail.com=""> wrote: >> thx >> i must use another function to do glocal match ^^^^^^ Is this a new word for "global-local"? ;-) Sorry I couldn't resist. H. >> and trim them >> >> rigth? > > Sounds about right to me. Perhaps you're looking for `matchPattern` or > `pairwiseAlignment`? > > Out of curiosity, are you pre-processing NGS data? > > If so, what type of reads have artifacts in the center that you want > to remove but still "trust" the flanking sequence? > > -steve > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6