[biocpkgs] suggestions on package matchprobes
2
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
Please ask these sorts of questions on the Bioconductor mailing list - redirected there and for generic sequence matching Biostrings is a better tool - we will look into this, thanks Robert Xinxia Peng wrote: > =+=+=+=+=+=+=+=+=+ biocpkgs mailing list +=+=+=+=+=+=+=+=+= > Dear Bioc Team, > > It appears that the function 'matchprobes' will not work with sequences in lower case. Also it might be nice not to match empty string. See the following example: > > >> test.seq > [1] "atggcggcgcaaagtagtggtgggggtggaggttgtggtgaggaagataaagatgccaaata tatgtttgataggatagggaaagaagtgcacgacgaag" > [2] "atgaaaagggtaatgcaacaatttgtggatcgtacaacacaacgatttcacgaatatgatga aaggatgaaaactacacgccaaaaatgtaaagaacgat" > [3] "atgaaacttcactgctctaaaatattattatttttacttccattaaatatattagtaacatc attatcaaatgtgcataataataataaactatacaaca" > [4] "atgaaagtccattatattaatatattattgtttgctcttccattaaatatattggaacataa taaaaatgaaccacacaccacaccaaatcatacacaaa" > [5] "atgtttacaacaaaaaaaaaaattaaatatattataattatatgtggcatctttcgaaaata tttcaaattcggaagaattattgaggttccaatgatgc" > [6] "atgaaactgcactactctaatatattattatttttctttccattaaatatattagtaacatc atatcatgtatataataaaaataaaatatacatcacac" > [7] "atgtgtgctattggagaattactatcatctacagataaggaatatactcttaatttctttgg tttagttaaagatggagcatcgattgatgaaatgaaag" > [8] "atgattaagatgaaattccattatgtaggatattattctgaagaagaaaatatgaaaaatac actgaaaatttgttccgttagacaaatatttttaaatt" > [9] "atgttattatttgctttattatttaatgcacttttattatcacaaaatgtaaattgccgaaa caacaattataatataagattcactcaaacgataacac" > [10] "atgatataccacagaaggattatagcttatctcataaatcatctaccattaggtatatccct tacagaagtggtcgatataaatgaagaacatatattta" >> test.p > [1] "atggcggcgcaaagtagtggtgggg" >> matchprobes(test.seq, test.p) > $match > $match[[1]] > numeric(0) > > $match[[2]] > numeric(0) > > $match[[3]] > numeric(0) > > $match[[4]] > numeric(0) > > $match[[5]] > numeric(0) > > $match[[6]] > numeric(0) > > $match[[7]] > numeric(0) > > $match[[8]] > numeric(0) > > $match[[9]] > numeric(0) > > $match[[10]] > numeric(0) > >> matchprobes(toupper(test.seq), toupper(c(test.p, ""))) > $match > $match[[1]] > [1] 1 2 > > $match[[2]] > [1] 2 > > $match[[3]] > [1] 2 > > $match[[4]] > [1] 2 > > $match[[5]] > [1] 2 > > $match[[6]] > [1] 2 > > $match[[7]] > [1] 2 > > $match[[8]] > [1] 2 > > $match[[9]] > [1] 2 > > $match[[10]] > [1] 2 > > > Thanks, > Xinxia Peng > Seattle Biomedical Research Institute > > > > __________________________________________________________________ > biocpkgs mailing list > To unsubscribe from this mailing list send a blank email to > biocpkgs-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > http://lists.fhcrc.org/mailman/listinfo/biocpkgs -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
Cancer Biostrings Cancer Biostrings • 1.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Hi Xinxia, thanks! 1. The problem with the cases was simple: the function 'matchprobes' calls C code to do the actual work, and it was: matchprobes <- function(query, records, probepos=FALSE) .Call("MP_matchprobes", toupper(query), records, probepos, PACKAGE="matchprobes") I removed the "toupper" in matchprobes_1.5.1, this should make you happier. There is no good reason why it should have been there, and that it was not documented was a bug. So now it is gone. 2. As Robert said, for generic sequence matching please use "Biostrings", that is much better. "matchprobes" only still exists for backward compatibility. Best wishes Wolfgang Robert Gentleman wrote: > Please ask these sorts of questions on the Bioconductor mailing list - > redirected there > > and for generic sequence matching Biostrings is a better tool - > we will look into this, > thanks > Robert > > Xinxia Peng wrote: >> =+=+=+=+=+=+=+=+=+ biocpkgs mailing list +=+=+=+=+=+=+=+=+= >> Dear Bioc Team, >> >> It appears that the function 'matchprobes' will not work with sequences in lower case. Also it might be nice not to match empty string. See the following example: >> >> >>> test.seq >> [1] "atggcggcgcaaagtagtggtgggggtggaggttgtggtgaggaagataaagatgccaaat atatgtttgataggatagggaaagaagtgcacgacgaag" >> [2] "atgaaaagggtaatgcaacaatttgtggatcgtacaacacaacgatttcacgaatatgatg aaaggatgaaaactacacgccaaaaatgtaaagaacgat" >> [3] "atgaaacttcactgctctaaaatattattatttttacttccattaaatatattagtaacat cattatcaaatgtgcataataataataaactatacaaca" >> [4] "atgaaagtccattatattaatatattattgtttgctcttccattaaatatattggaacata ataaaaatgaaccacacaccacaccaaatcatacacaaa" >> [5] "atgtttacaacaaaaaaaaaaattaaatatattataattatatgtggcatctttcgaaaat atttcaaattcggaagaattattgaggttccaatgatgc" >> [6] "atgaaactgcactactctaatatattattatttttctttccattaaatatattagtaacat catatcatgtatataataaaaataaaatatacatcacac" >> [7] "atgtgtgctattggagaattactatcatctacagataaggaatatactcttaatttctttg gtttagttaaagatggagcatcgattgatgaaatgaaag" >> [8] "atgattaagatgaaattccattatgtaggatattattctgaagaagaaaatatgaaaaata cactgaaaatttgttccgttagacaaatatttttaaatt" >> [9] "atgttattatttgctttattatttaatgcacttttattatcacaaaatgtaaattgccgaa acaacaattataatataagattcactcaaacgataacac" >> [10] "atgatataccacagaaggattatagcttatctcataaatcatctaccattaggtatatccc ttacagaagtggtcgatataaatgaagaacatatattta" >>> test.p >> [1] "atggcggcgcaaagtagtggtgggg" >>> matchprobes(test.seq, test.p) >> $match >> $match[[1]] >> numeric(0) >> >> $match[[2]] >> numeric(0) >> >> $match[[3]] >> numeric(0) >> >> $match[[4]] >> numeric(0) >> >> $match[[5]] >> numeric(0) >> >> $match[[6]] >> numeric(0) >> >> $match[[7]] >> numeric(0) >> >> $match[[8]] >> numeric(0) >> >> $match[[9]] >> numeric(0) >> >> $match[[10]] >> numeric(0) >> >>> matchprobes(toupper(test.seq), toupper(c(test.p, ""))) >> $match >> $match[[1]] >> [1] 1 2 >> >> $match[[2]] >> [1] 2 >> >> $match[[3]] >> [1] 2 >> >> $match[[4]] >> [1] 2 >> >> $match[[5]] >> [1] 2 >> >> $match[[6]] >> [1] 2 >> >> $match[[7]] >> [1] 2 >> >> $match[[8]] >> [1] 2 >> >> $match[[9]] >> [1] 2 >> >> $match[[10]] >> [1] 2 >> >> >> Thanks, >> Xinxia Peng >> Seattle Biomedical Research Institute >> >> >> >> __________________________________________________________________ >> biocpkgs mailing list >> To unsubscribe from this mailing list send a blank email to >> biocpkgs-leave at lists.fhcrc.org >> You can also unsubscribe or change your personal options at >> http://lists.fhcrc.org/mailman/listinfo/biocpkgs > -- ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
ADD COMMENT
0
Entering edit mode
Xinxia Peng ▴ 120
@xinxia-peng-1881
Last seen 10.2 years ago
Thanks a lot! Maybe you can remind users that the matching is case-sensitive. For DNA sequences, people might tend to treat the lowercase and the uppercase the same. I was looking at the library "altcdfenvs" to create an alternative CDF environment. I do not see a straightforward connection between "altcdfenvs" and "Biostrings". Any suggestions? BTW, my previous reply was held because this email address was not subscribed to this list. Now it should work. Best, Xinxia -----Original Message----- From: Wolfgang Huber [mailto:huber@ebi.ac.uk] Sent: Thursday, September 14, 2006 2:31 AM To: Robert Gentleman Cc: Xinxia Peng; Bioconductor Subject: Re: [BioC] [biocpkgs] suggestions on package matchprobes Hi Xinxia, thanks! 1. The problem with the cases was simple: the function 'matchprobes' calls C code to do the actual work, and it was: matchprobes <- function(query, records, probepos=FALSE) .Call("MP_matchprobes", toupper(query), records, probepos, PACKAGE="matchprobes") I removed the "toupper" in matchprobes_1.5.1, this should make you happier. There is no good reason why it should have been there, and that it was not documented was a bug. So now it is gone. 2. As Robert said, for generic sequence matching please use "Biostrings", that is much better. "matchprobes" only still exists for backward compatibility. Best wishes Wolfgang Robert Gentleman wrote: > Please ask these sorts of questions on the Bioconductor mailing list - > redirected there > > and for generic sequence matching Biostrings is a better tool - we > will look into this, thanks Robert > > Xinxia Peng wrote: >> =+=+=+=+=+=+=+=+=+ biocpkgs mailing list +=+=+=+=+=+=+=+=+= Dear Bioc >> Team, >> >> It appears that the function 'matchprobes' will not work with sequences in lower case. Also it might be nice not to match empty string. See the following example: >> >> >>> test.seq >> [1] "atggcggcgcaaagtagtggtgggggtggaggttgtggtgaggaagataaagatgccaaatatatgttt ga taggatagggaaagaagtgcacgacgaag" >> [2] "atgaaaagggtaatgcaacaatttgtggatcgtacaacacaacgatttcacgaatatgatgaaaggatg aa aactacacgccaaaaatgtaaagaacgat" >> [3] "atgaaacttcactgctctaaaatattattatttttacttccattaaatatattagtaacatcattatca aa tgtgcataataataataaactatacaaca" >> [4] "atgaaagtccattatattaatatattattgtttgctcttccattaaatatattggaacataataaaaat ga accacacaccacaccaaatcatacacaaa" >> [5] "atgtttacaacaaaaaaaaaaattaaatatattataattatatgtggcatctttcgaaaatatttcaaa tt cggaagaattattgaggttccaatgatgc" >> [6] "atgaaactgcactactctaatatattattatttttctttccattaaatatattagtaacatcatatcat gt atataataaaaataaaatatacatcacac" >> [7] "atgtgtgctattggagaattactatcatctacagataaggaatatactcttaatttctttggtttagtt aa agatggagcatcgattgatgaaatgaaag" >> [8] "atgattaagatgaaattccattatgtaggatattattctgaagaagaaaatatgaaaaatacactgaaa at ttgttccgttagacaaatatttttaaatt" >> [9] "atgttattatttgctttattatttaatgcacttttattatcacaaaatgtaaattgccgaaacaacaat ta taatataagattcactcaaacgataacac" >> [10] "atgatataccacagaaggattatagcttatctcataaatcatctaccattaggtatatcccttacagaa gt ggtcgatataaatgaagaacatatattta" >>> test.p >> [1] "atggcggcgcaaagtagtggtgggg" >>> matchprobes(test.seq, test.p) >> $match >> $match[[1]] >> numeric(0) >> >> $match[[2]] >> numeric(0) >> >> $match[[3]] >> numeric(0) >> >> $match[[4]] >> numeric(0) >> >> $match[[5]] >> numeric(0) >> >> $match[[6]] >> numeric(0) >> >> $match[[7]] >> numeric(0) >> >> $match[[8]] >> numeric(0) >> >> $match[[9]] >> numeric(0) >> >> $match[[10]] >> numeric(0) >> >>> matchprobes(toupper(test.seq), toupper(c(test.p, ""))) >> $match >> $match[[1]] >> [1] 1 2 >> >> $match[[2]] >> [1] 2 >> >> $match[[3]] >> [1] 2 >> >> $match[[4]] >> [1] 2 >> >> $match[[5]] >> [1] 2 >> >> $match[[6]] >> [1] 2 >> >> $match[[7]] >> [1] 2 >> >> $match[[8]] >> [1] 2 >> >> $match[[9]] >> [1] 2 >> >> $match[[10]] >> [1] 2 >> >> >> Thanks, >> Xinxia Peng >> Seattle Biomedical Research Institute >> >> >> >> __________________________________________________________________ >> biocpkgs mailing list >> To unsubscribe from this mailing list send a blank email to >> biocpkgs-leave at lists.fhcrc.org You can also unsubscribe or change >> your personal options at >> http://lists.fhcrc.org/mailman/listinfo/biocpkgs > -- ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
ADD COMMENT
0
Entering edit mode
> Thanks a lot! > > Maybe you can remind users that the matching is case-sensitive. For DNA sequences, people might tend to treat the lowercase and the uppercase the same. > I was looking at the library "altcdfenvs" to create an alternative CDF environment. I do not see a straightforward connection between > "altcdfenvs" and "Biostrings". Any suggestions? You raise a good point here. I would have liked to use something generic for DNA/RNA sequence but "Biostrings" was in its infancy when "altcdfenvs" was written... and now because of my current occupation further work on the package is unlikely to happen soon.. Low-energy approaches would be to either write a function that transforms a a list of 'BStringViews' to a list such as the one returned by 'matchprobes' and feed it to 'buildCdfEnv.matchprobes' (as the vignette and documentation for 'buildCdfEnv.matchprobes' indicate), or to modify 'buildCdfEnv.matchprobes' to accept a list of 'BStringViews' as input (and in that last case you will mostly only have to work on the following code: xy <- getxy.probeseq(probeseq=probe.tab, i.row=matches$match[[i]], x.colname = x.colname, y.colname = y.colname) ) Hoping this helps, Laurent > BTW, my previous reply was held because this email address was not subscribed to this list. Now it should work. > > Best, > Xinxia > > -----Original Message----- > From: Wolfgang Huber [mailto:huber at ebi.ac.uk] > Sent: Thursday, September 14, 2006 2:31 AM > To: Robert Gentleman > Cc: Xinxia Peng; Bioconductor > Subject: Re: [BioC] [biocpkgs] suggestions on package matchprobes > > Hi Xinxia, > > thanks! > 1. The problem with the cases was simple: the function 'matchprobes' calls C code to do the actual work, and it was: > > matchprobes <- function(query, records, probepos=FALSE) > .Call("MP_matchprobes", toupper(query), records, probepos, > PACKAGE="matchprobes") > > I removed the "toupper" in matchprobes_1.5.1, this should make you happier. There is no good reason why it should have been there, and that it was not documented was a bug. So now it is gone. > > 2. As Robert said, for generic sequence matching please use > "Biostrings", that is much better. "matchprobes" only still exists for backward compatibility. > > Best wishes > Wolfgang > > > Robert Gentleman wrote: >> Please ask these sorts of questions on the Bioconductor mailing list - > >> redirected there >> and for generic sequence matching Biostrings is a better tool - we will look into this, thanks Robert >> Xinxia Peng wrote: >>> =+=+=+=+=+=+=+=+=+ biocpkgs mailing list +=+=+=+=+=+=+=+=+= Dear Bioc > >>> Team, >>> It appears that the function 'matchprobes' will not work with > sequences in lower case. Also it might be nice not to match empty string. See the following example: >>>> test.seq >>> [1] > "atggcggcgcaaagtagtggtgggggtggaggttgtggtgaggaagataaagatgccaaatatatgt ttga taggatagggaaagaagtgcacgacgaag" >>> [2] > "atgaaaagggtaatgcaacaatttgtggatcgtacaacacaacgatttcacgaatatgatgaaagga tgaa aactacacgccaaaaatgtaaagaacgat" >>> [3] > "atgaaacttcactgctctaaaatattattatttttacttccattaaatatattagtaacatcattat caaa tgtgcataataataataaactatacaaca" >>> [4] > "atgaaagtccattatattaatatattattgtttgctcttccattaaatatattggaacataataaaa atga accacacaccacaccaaatcatacacaaa" >>> [5] > "atgtttacaacaaaaaaaaaaattaaatatattataattatatgtggcatctttcgaaaatatttca aatt cggaagaattattgaggttccaatgatgc" >>> [6] > "atgaaactgcactactctaatatattattatttttctttccattaaatatattagtaacatcatatc atgt atataataaaaataaaatatacatcacac" >>> [7] > "atgtgtgctattggagaattactatcatctacagataaggaatatactcttaatttctttggtttag ttaa agatggagcatcgattgatgaaatgaaag" >>> [8] > "atgattaagatgaaattccattatgtaggatattattctgaagaagaaaatatgaaaaatacactga aaat ttgttccgttagacaaatatttttaaatt" >>> [9] > "atgttattatttgctttattatttaatgcacttttattatcacaaaatgtaaattgccgaaacaaca atta taatataagattcactcaaacgataacac" >>> [10] > "atgatataccacagaaggattatagcttatctcataaatcatctaccattaggtatatcccttacag aagt ggtcgatataaatgaagaacatatattta" >>>> test.p >>> [1] "atggcggcgcaaagtagtggtgggg" >>>> matchprobes(test.seq, test.p) >>> $match >>> $match[[1]] >>> numeric(0) >>> $match[[2]] >>> numeric(0) >>> $match[[3]] >>> numeric(0) >>> $match[[4]] >>> numeric(0) >>> $match[[5]] >>> numeric(0) >>> $match[[6]] >>> numeric(0) >>> $match[[7]] >>> numeric(0) >>> $match[[8]] >>> numeric(0) >>> $match[[9]] >>> numeric(0) >>> $match[[10]] >>> numeric(0) >>>> matchprobes(toupper(test.seq), toupper(c(test.p, ""))) >>> $match >>> $match[[1]] >>> [1] 1 2 >>> $match[[2]] >>> [1] 2 >>> $match[[3]] >>> [1] 2 >>> $match[[4]] >>> [1] 2 >>> $match[[5]] >>> [1] 2 >>> $match[[6]] >>> [1] 2 >>> $match[[7]] >>> [1] 2 >>> $match[[8]] >>> [1] 2 >>> $match[[9]] >>> [1] 2 >>> $match[[10]] >>> [1] 2 >>> Thanks, >>> Xinxia Peng >>> Seattle Biomedical Research Institute >>> __________________________________________________________________ biocpkgs mailing list >>> To unsubscribe from this mailing list send a blank email to >>> biocpkgs-leave at lists.fhcrc.org You can also unsubscribe or change your personal options at >>> http://lists.fhcrc.org/mailman/listinfo/biocpkgs > > > -- > ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6