Biostring: print sequence alignment to file
1
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Martin, On 06/22/2012 12:09 PM, Hervé Pagès wrote: > Hi Martin, > > On 06/14/2012 06:55 AM, Martin Preusse wrote: >> Hi guys, >> >> anything new on the sequence output? Maybe I missed something :) >> please tell me if you need testing etc. > > Still on my list. Will work on this in the next couple of weeks. I'll > let you know. Thanks for the reminder. There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) for doing this. It produces a file in the "pair" format (as described on the EMBOSS website, at the URL you sent earlier): > library(Biostrings) > pattern <- DNAString("CGTACGTAACGTTCGT") > subject <- DNAString("CGTCGTCGTCCGTAA") > x1 <- pairwiseAlignment(pattern, subject) > x1 Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [1] CGTACGTAACGTTCGT subject: [1] CGT-CGT--CGTCCGT score: -32.11822 > writePairwiseAlignments(x1, block.width=10) # 'block.width' default is 50 ######################################## # Program: Biostrings (version 2.25.8), a Bioconductor package # Rundate: Wed Jul 18 23:05:46 2012 ######################################## #======================================= # # Aligned_sequences: 2 # 1: P1 # 2: S1 # Matrix: NA # Gap_penalty: 14.0 # Extend_penalty: 4.0 # # Length: 18 # Identity: 12/18 (66.7%) # Similarity: NA/18 (NA%) # Gaps: 5/18 (27.8%) # Score: -32.11822 # # #======================================= P1 1 CGTACGTAAC 10 ||| ||| | S1 1 CGT-CGT--C 7 P1 11 GTTCGT-- 16 || ||| S1 8 GTCCGTAA 15 #--------------------------------------- #--------------------------------------- Only lightly tested. Not necessarily very performant (no C code). Please have a look at the man page for some caveats (especially if you plan to use it on NON global alignments). Feedback welcome. Thanks, H. > > H. > >> >> Cheers >> Martin >> >> >> Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: >> >>> Hi Herv?, >>> >>> thanks for your help! If you need suggestions, help or testing, just >>> say the word. >>> >>> Will you implement the header also? If you do so, I would be thankful >>> for an option like "header=F" for the output. >>> >>> >>> Cheers >>> Martin >>> >>> >>> Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: >>> >>>> Thanks Martin and Thomas for the useful feedback. The 'pair' and >>>> 'markx0' formats supported by Emboss seem indeed appropriate for >>>> printing the output of pairwiseAlignment() to a file. I'll add >>>> support for those 2 formats in Biostrings. Won't be before 1 week >>>> or 2 though... >>>> >>>> Cheers, >>>> H. >>>> >>>> On 04/18/2012 03:20 AM, Martin Preusse wrote: >>>>> Hi, >>>>> >>>>> I just found this function to print a pairwise alignments in >>>>> blocks. Doesn't add the match/mismatch indicators between >>>>> sequences, but might be a starting point: >>>>> >>>>> http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment >>>>> >>>>> >>>>> >>>>> Cheers >>>>> Martin >>>>> >>>>> >>>>> >>>>> Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: >>>>> >>>>>> Hi everybody, >>>>>> >>>>>> I think the output format depends on the purpose of the alignment. >>>>>> >>>>>> A pairwise sequence alignment is usually done to compare two >>>>>> sequences base by base. In my case, I compare sequencing results >>>>>> of cloned expression constructs with the desired sequence. Thus, >>>>>> the best output format would be "BLAST like". >>>>>> >>>>>> seq1: 1 ATCTGC 7 >>>>>> | | | . . | >>>>>> seq2: 1 ATCAAC 7 >>>>>> >>>>>> When doing MSA, most people might rather be interested in the >>>>>> consensus sequence. E.g. in the context of conservation between >>>>>> species. >>>>>> >>>>>> So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() >>>>>> are quite different and BLAST doesn't make much sense for multiple >>>>>> alignments. This means it would be best to put the output in the >>>>>> PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? >>>>>> >>>>>> This is an overview of sequence alignment formats used by EMBOSS: >>>>>> http://emboss.sourceforge.net/docs/themes/AlignFormats.html >>>>>> >>>>>> 'pair' or 'markx0' would be perfectly fine. >>>>>> >>>>>> >>>>>> Cheers >>>>>> Martin >>>>>> >>>>>> >>>>>> >>>>>> Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: >>>>>> >>>>>>> Hi Herv?, >>>>>>> >>>>>>> To me, the most basic and versatile MSA or pairwise alignment >>>>>>> format to output >>>>>>> to would be FASTA since it is compatible with almost any other >>>>>>> alignment >>>>>>> editing software. For text-based viewing purposes my preference >>>>>>> would be >>>>>>> to also output to a format similar to the one shown in the following >>>>>>> example. When there are only two sequences then one could show >>>>>>> instead >>>>>>> of a consensus line the pipe characters between the two sequences to >>>>>>> indicate identical residues which mimics the blast output. A more >>>>>>> standardized version of this pairwise alignment format can be found >>>>>>> here: >>>>>>> http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html >>>>>>> >>>>>>> library(Biostrings) >>>>>>> p450<- >>>>>>> read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/R_B ioCond/Samples/p450.mul", >>>>>>> "fasta") >>>>>>> >>>>>>> StringSet2html<- function(msa=p450, file="p450.html", start=1, >>>>>>> end=length(p450[[1]]), counter=20, browser=TRUE, ...) { >>>>>>> if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, >>>>>>> end=end) >>>>>>> if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, >>>>>>> start=start, end=end) >>>>>>> msavec<- sapply(msa, toString) >>>>>>> offset<- (counter-1)-nchar(nchar(msavec[1])) >>>>>>> legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), >>>>>>> format(seq(0, >>>>>>> nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") >>>>>>> consensus<- consensusString(msavec, ambiguityMap=".", ...) >>>>>>> msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") >>>>>>> msavec<- paste(format(c("", names(msa), "Consensus"), >>>>>>> justify="left"), c(legend, msavec, >>>>>>> consensus), sep=" ") >>>>>>> msavec<- c("<html>
", msavec,"
</html>") >>>>>>> writeLines(msavec, file) >>>>>>> if(browser==TRUE) { browseURL(file) } >>>>>>> } >>>>>>> StringSet2html(msa=p450, file="p450.html", start=1, >>>>>>> end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) >>>>>>> StringSet2html(msa=p450, file="p450.html", start=450, end=470, >>>>>>> counter=20, browser=T, threshold=1.0) >>>>>>> >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: >>>>>>>> Hi Thomas, >>>>>>>> >>>>>>>> On 04/17/2012 11:49 AM, Thomas Girke wrote: >>>>>>>>> What about providing an option in pairwiseAlignment to output >>>>>>>>> to the >>>>>>>>> MultipleAlignment class in Biostrings and then write the latter to >>>>>>>>> different alignment formats? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Or we could provide coercion methods to switch between >>>>>>>> PairwiseAlignedXStringSet and MultipleAlignment. >>>>>>>> >>>>>>>> Anyway that kind of moves Martin's problem from having a >>>>>>>> write.PairwiseAlignedXStringSet() function that produces BLAST >>>>>>>> output >>>>>>>> to having a write.MultipleAlignment() function that produces BLAST >>>>>>>> output. For the specific case of BLAST output, would it make sense >>>>>>>> to support it for MultipleAlignment? Can someone point me to an >>>>>>>> example >>>>>>>> of such output? Or even better, to the specs of such format? >>>>>>>> >>>>>>>> Note that right now there is the write.phylip() function in >>>>>>>> Biostrings >>>>>>>> for writing a MultipleAlignment object to a file but the Phylip >>>>>>>> format >>>>>>>> looks very different from the BLAST output: >>>>>>>> >>>>>>>> hpages at latitude:~$ head -n 20 phylip_test.txt >>>>>>>> 9 2343 >>>>>>>> Mask 0000000000 0000000000 0000000000 0000000000 0000000000 >>>>>>>> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG >>>>>>>> Chimp ---------- ---------- ---------- ---------- ---------- >>>>>>>> Cow ---------- ---------- ---------- ---------- ---------- >>>>>>>> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG >>>>>>>> Rat ---------- ---------- ---------- ---------- ---------- >>>>>>>> Dog ---------- ---------- ---------- ---------- ---------- >>>>>>>> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC >>>>>>>> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG >>>>>>>> >>>>>>>> 0000000000 0000000000 0000000000 0001111111 1111111111 >>>>>>>> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT >>>>>>>> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC >>>>>>>> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT >>>>>>>> >>>>>>>> Thanks! >>>>>>>> H. >>>>>>>> >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >>>>>>>>>> Hi Martin, >>>>>>>>>> >>>>>>>>>> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>>>>>>>>>> Hi Charles, >>>>>>>>>>> >>>>>>>>>>> thanks! Your solution allows to print the two alignment >>>>>>>>>>> strings separately. >>>>>>>>>>> >>>>>>>>>>> I was thinking of an output as generated by alignment tools: >>>>>>>>>>> >>>>>>>>>>> AGT-TCTAT >>>>>>>>>>> | | | | | | | | | >>>>>>>>>>> AGTATCTAT >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This looks like BLAST output. Is this what you have in mind? >>>>>>>>>> Note that >>>>>>>>>> there are many alignment tools and many ways to output the >>>>>>>>>> result to a >>>>>>>>>> file. I'm not really familiar with the BLAST output format. Is it >>>>>>>>>> specified somewhere? Would that make sense to add something >>>>>>>>>> like a >>>>>>>>>> write.PairwiseAlignedXStringSet() function to Biostrings for >>>>>>>>>> writing >>>>>>>>>> the result of pairwiseAlignment() to a file? We could do this and >>>>>>>>>> support the BLAST format if that's a commonly used format. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> H. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For this I would have to write a function to output the >>>>>>>>>>> strings in blocks of e.g. 60 nucleotides, right? >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>>>>>>>>>> >>>>>>>>>>>> write.XStringSet >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: >>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Hervé Pagès >>>>>>>>>> >>>>>>>>>> Program in Computational Biology >>>>>>>>>> Division of Public Health Sciences >>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>> P.O. Box 19024 >>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>> >>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Hervé Pagès >>>>>>>> >>>>>>>> Program in Computational Biology >>>>>>>> Division of Public Health Sciences >>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>> P.O. Box 19024 >>>>>>>> Seattle, WA 98109-1024 >>>>>>>> >>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>> Phone: (206) 667-5791 >>>>>>>> Fax: (206) 667-1319 >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>> >> >> >> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Alignment Cancer Biostrings Alignment Cancer Biostrings • 2.2k views
ADD COMMENT
0
Entering edit mode
@martin-preusse-5224
Last seen 10.2 years ago
Hi Herv?, thanks! This looks great. I have a question though: How can I install the latest version of Biostrings? When I install with: source("http://bioconductor.org/biocLite.R") biocLite("Biostrings") I get Version 2.22.0 ? ;) Cheers Martin Am Donnerstag, 19. Juli 2012 um 08:10 schrieb Hervé Pagès: > Hi Martin, > > On 06/22/2012 12:09 PM, Hervé Pagès wrote: > > Hi Martin, > > > > On 06/14/2012 06:55 AM, Martin Preusse wrote: > > > Hi guys, > > > > > > anything new on the sequence output? Maybe I missed something :) > > > please tell me if you need testing etc. > > > > > > > > Still on my list. Will work on this in the next couple of weeks. I'll > > let you know. Thanks for the reminder. > > > > There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) > for doing this. It produces a file in the "pair" format (as described on > the EMBOSS website, at the URL you sent earlier): > > > library(Biostrings) > > pattern <- DNAString("CGTACGTAACGTTCGT") > > subject <- DNAString("CGTCGTCGTCCGTAA") > > x1 <- pairwiseAlignment(pattern, subject) > > > > x1 > Global PairwiseAlignmentsSingleSubject (1 of 1) > pattern: [1] CGTACGTAACGTTCGT > subject: [1] CGT-CGT--CGTCCGT > score: -32.11822 > > > writePairwiseAlignments(x1, block.width=10) # 'block.width' > default is 50 > ######################################## > # Program: Biostrings (version 2.25.8), a Bioconductor package > # Rundate: Wed Jul 18 23:05:46 2012 > ######################################## > #======================================= > # > # Aligned_sequences: 2 > # 1: P1 > # 2: S1 > # Matrix: NA > # Gap_penalty: 14.0 > # Extend_penalty: 4.0 > # > # Length: 18 > # Identity: 12/18 (66.7%) > # Similarity: NA/18 (NA%) > # Gaps: 5/18 (27.8%) > # Score: -32.11822 > # > # > #======================================= > > P1 1 CGTACGTAAC 10 > ||| ||| | > S1 1 CGT-CGT--C 7 > > P1 11 GTTCGT-- 16 > || ||| > S1 8 GTCCGTAA 15 > > > #--------------------------------------- > #--------------------------------------- > > > Only lightly tested. Not necessarily very performant (no C code). Please > have a look at the man page for some caveats (especially if you plan to > use it on NON global alignments). Feedback welcome. > > Thanks, > H. > > > > > H. > > > > > > > > Cheers > > > Martin > > > > > > > > > Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: > > > > > > > Hi Herv?, > > > > > > > > thanks for your help! If you need suggestions, help or testing, just > > > > say the word. > > > > > > > > Will you implement the header also? If you do so, I would be thankful > > > > for an option like "header=F" for the output. > > > > > > > > > > > > Cheers > > > > Martin > > > > > > > > > > > > Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: > > > > > > > > > Thanks Martin and Thomas for the useful feedback. The 'pair' and > > > > > 'markx0' formats supported by Emboss seem indeed appropriate for > > > > > printing the output of pairwiseAlignment() to a file. I'll add > > > > > support for those 2 formats in Biostrings. Won't be before 1 week > > > > > or 2 though... > > > > > > > > > > Cheers, > > > > > H. > > > > > > > > > > On 04/18/2012 03:20 AM, Martin Preusse wrote: > > > > > > Hi, > > > > > > > > > > > > I just found this function to print a pairwise alignments in > > > > > > blocks. Doesn't add the match/mismatch indicators between > > > > > > sequences, but might be a starting point: > > > > > > > > > > > > http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > > > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > > > I think the output format depends on the purpose of the alignment. > > > > > > > > > > > > > > A pairwise sequence alignment is usually done to compare two > > > > > > > sequences base by base. In my case, I compare sequencing results > > > > > > > of cloned expression constructs with the desired sequence. Thus, > > > > > > > the best output format would be "BLAST like". > > > > > > > > > > > > > > seq1: 1 ATCTGC 7 > > > > > > > | | | . . | > > > > > > > seq2: 1 ATCAAC 7 > > > > > > > > > > > > > > When doing MSA, most people might rather be interested in the > > > > > > > consensus sequence. E.g. in the context of conservation between > > > > > > > species. > > > > > > > > > > > > > > So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() > > > > > > > are quite different and BLAST doesn't make much sense for multiple > > > > > > > alignments. This means it would be best to put the output in the > > > > > > > PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? > > > > > > > > > > > > > > This is an overview of sequence alignment formats used by EMBOSS: > > > > > > > http://emboss.sourceforge.net/docs/themes/AlignFormats.html > > > > > > > > > > > > > > 'pair' or 'markx0' would be perfectly fine. > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > > > > > > > > > > > > > > > Hi Herv?, > > > > > > > > > > > > > > > > To me, the most basic and versatile MSA or pairwise alignment > > > > > > > > format to output > > > > > > > > to would be FASTA since it is compatible with almost any other > > > > > > > > alignment > > > > > > > > editing software. For text-based viewing purposes my preference > > > > > > > > would be > > > > > > > > to also output to a format similar to the one shown in the following > > > > > > > > example. When there are only two sequences then one could show > > > > > > > > instead > > > > > > > > of a consensus line the pipe characters between the two sequences to > > > > > > > > indicate identical residues which mimics the blast output. A more > > > > > > > > standardized version of this pairwise alignment format can be found > > > > > > > > here: > > > > > > > > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > > > > > > > > > > > > > > > library(Biostrings) > > > > > > > > p450<- > > > > > > > > read.AAStringSet("http://faculty.ucr.edu/~tgirke/Docum ents/R_BioCond/Samples/p450.mul", > > > > > > > > "fasta") > > > > > > > > > > > > > > > > StringSet2html<- function(msa=p450, file="p450.html", start=1, > > > > > > > > end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > > > > > > > > if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, > > > > > > > > end=end) > > > > > > > > if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, > > > > > > > > start=start, end=end) > > > > > > > > msavec<- sapply(msa, toString) > > > > > > > > offset<- (counter-1)-nchar(nchar(msavec[1])) > > > > > > > > legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), > > > > > > > > format(seq(0, > > > > > > > > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > > > > > > > > consensus<- consensusString(msavec, ambiguityMap=".", ...) > > > > > > > > msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > > > > > > > > msavec<- paste(format(c("", names(msa), "Consensus"), > > > > > > > > justify="left"), c(legend, msavec, > > > > > > > > consensus), sep=" ") > > > > > > > > msavec<- c("<html>
", msavec,"
</html>") > > > > > > > > writeLines(msavec, file) > > > > > > > > if(browser==TRUE) { browseURL(file) } > > > > > > > > } > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=1, > > > > > > > > end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, > > > > > > > > counter=20, browser=T, threshold=1.0) > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > > > > > > > Hi Thomas, > > > > > > > > > > > > > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > > > > > > > What about providing an option in pairwiseAlignment to output > > > > > > > > > > to the > > > > > > > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > > > > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > > > > > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > > > > > > > > > > > > > Anyway that kind of moves Martin's problem from having a > > > > > > > > > write.PairwiseAlignedXStringSet() function that produces BLAST > > > > > > > > > output > > > > > > > > > to having a write.MultipleAlignment() function that produces BLAST > > > > > > > > > output. For the specific case of BLAST output, would it make sense > > > > > > > > > to support it for MultipleAlignment? Can someone point me to an > > > > > > > > > example > > > > > > > > > of such output? Or even better, to the specs of such format? > > > > > > > > > > > > > > > > > > Note that right now there is the write.phylip() function in > > > > > > > > > Biostrings > > > > > > > > > for writing a MultipleAlignment object to a file but the Phylip > > > > > > > > > format > > > > > > > > > looks very different from the BLAST output: > > > > > > > > > > > > > > > > > > hpages at latitude:~$ head -n 20 phylip_test.txt > > > > > > > > > 9 2343 > > > > > > > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > > > > > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > > > > > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Cow ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > > > > > > > Rat ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Dog ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > > > > > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > > > > > > > > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > > > > > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA- TGGCT > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > ---------- ---------- ---------- ---GAGAGAA GTCA- TGGCT > > > > > > > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA- TGGCT > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC- CAGCC > > > > > > > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > > > > > > > Hi Charles, > > > > > > > > > > > > > > > > > > > > > > > > thanks! Your solution allows to print the two alignment > > > > > > > > > > > > strings separately. > > > > > > > > > > > > > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > > > > > > > > > > > > > AGT-TCTAT > > > > > > > > > > > > | | | | | | | | | > > > > > > > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? > > > > > > > > > > > Note that > > > > > > > > > > > there are many alignment tools and many ways to output the > > > > > > > > > > > result to a > > > > > > > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > > > > > > > specified somewhere? Would that make sense to add something > > > > > > > > > > > like a > > > > > > > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for > > > > > > > > > > > writing > > > > > > > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > > > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For this I would have to write a function to output the > > > > > > > > > > > > strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > Search the archives: > > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > > > Division of Public Health Sciences > > > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > > > P.O. Box 19024 > > > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > Search the archives: > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > Division of Public Health Sciences > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > P.O. Box 19024 > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Hervé Pagès > > > > > > > > > > Program in Computational Biology > > > > > Division of Public Health Sciences > > > > > Fred Hutchinson Cancer Research Center > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > P.O. Box 19024 > > > > > Seattle, WA 98109-1024 > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > Phone: (206) 667-5791 > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > Phone: (206) 667-5791 > Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Hi Martin, On 07/19/2012 04:30 AM, Martin Preusse wrote: > Hi Herv?, > > thanks! This looks great. I have a question though: How can I install the latest version of Biostrings? > > When I install with: > > source("http://bioconductor.org/biocLite.R") > biocLite("Biostrings") > > > > I get Version 2.22.0 ? ;) Biostrings 2.25 is part of the current devel version of BioC (BioC 2.11). To install BioC devel, see this link: http://bioconductor.org/developers/useDevel/ Note that today Biostrings 2.25.7 is the latest version available thru biocLite(): http://bioconductor.org/packages/2.11/bioc/html/Biostrings.html If everything goes well, 2.25.8 will become available tomorrow morning around 10 am (Seattle time). Please let me know if you run into any issue with this. Thanks! H. > > > Cheers > Martin > > > > Am Donnerstag, 19. Juli 2012 um 08:10 schrieb Hervé Pagès: > >> Hi Martin, >> >> On 06/22/2012 12:09 PM, Hervé Pagès wrote: >>> Hi Martin, >>> >>> On 06/14/2012 06:55 AM, Martin Preusse wrote: >>>> Hi guys, >>>> >>>> anything new on the sequence output? Maybe I missed something :) >>>> please tell me if you need testing etc. >>> >>> >>> >>> Still on my list. Will work on this in the next couple of weeks. I'll >>> let you know. Thanks for the reminder. >> >> >> >> There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) >> for doing this. It produces a file in the "pair" format (as described on >> the EMBOSS website, at the URL you sent earlier): >> >>> library(Biostrings) >>> pattern <- DNAString("CGTACGTAACGTTCGT") >>> subject <- DNAString("CGTCGTCGTCCGTAA") >>> x1 <- pairwiseAlignment(pattern, subject) >> >> >>> x1 >> Global PairwiseAlignmentsSingleSubject (1 of 1) >> pattern: [1] CGTACGTAACGTTCGT >> subject: [1] CGT-CGT--CGTCCGT >> score: -32.11822 >> >>> writePairwiseAlignments(x1, block.width=10) # 'block.width' >> default is 50 >> ######################################## >> # Program: Biostrings (version 2.25.8), a Bioconductor package >> # Rundate: Wed Jul 18 23:05:46 2012 >> ######################################## >> #======================================= >> # >> # Aligned_sequences: 2 >> # 1: P1 >> # 2: S1 >> # Matrix: NA >> # Gap_penalty: 14.0 >> # Extend_penalty: 4.0 >> # >> # Length: 18 >> # Identity: 12/18 (66.7%) >> # Similarity: NA/18 (NA%) >> # Gaps: 5/18 (27.8%) >> # Score: -32.11822 >> # >> # >> #======================================= >> >> P1 1 CGTACGTAAC 10 >> ||| ||| | >> S1 1 CGT-CGT--C 7 >> >> P1 11 GTTCGT-- 16 >> || ||| >> S1 8 GTCCGTAA 15 >> >> >> #--------------------------------------- >> #--------------------------------------- >> >> >> Only lightly tested. Not necessarily very performant (no C code). Please >> have a look at the man page for some caveats (especially if you plan to >> use it on NON global alignments). Feedback welcome. >> >> Thanks, >> H. >> >>> >>> H. >>> >>>> >>>> Cheers >>>> Martin >>>> >>>> >>>> Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: >>>> >>>>> Hi Herv?, >>>>> >>>>> thanks for your help! If you need suggestions, help or testing, just >>>>> say the word. >>>>> >>>>> Will you implement the header also? If you do so, I would be thankful >>>>> for an option like "header=F" for the output. >>>>> >>>>> >>>>> Cheers >>>>> Martin >>>>> >>>>> >>>>> Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: >>>>> >>>>>> Thanks Martin and Thomas for the useful feedback. The 'pair' and >>>>>> 'markx0' formats supported by Emboss seem indeed appropriate for >>>>>> printing the output of pairwiseAlignment() to a file. I'll add >>>>>> support for those 2 formats in Biostrings. Won't be before 1 week >>>>>> or 2 though... >>>>>> >>>>>> Cheers, >>>>>> H. >>>>>> >>>>>> On 04/18/2012 03:20 AM, Martin Preusse wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I just found this function to print a pairwise alignments in >>>>>>> blocks. Doesn't add the match/mismatch indicators between >>>>>>> sequences, but might be a starting point: >>>>>>> >>>>>>> http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment >>>>>>> >>>>>>> >>>>>>> >>>>>>> Cheers >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: >>>>>>> >>>>>>>> Hi everybody, >>>>>>>> >>>>>>>> I think the output format depends on the purpose of the alignment. >>>>>>>> >>>>>>>> A pairwise sequence alignment is usually done to compare two >>>>>>>> sequences base by base. In my case, I compare sequencing results >>>>>>>> of cloned expression constructs with the desired sequence. Thus, >>>>>>>> the best output format would be "BLAST like". >>>>>>>> >>>>>>>> seq1: 1 ATCTGC 7 >>>>>>>> | | | . . | >>>>>>>> seq2: 1 ATCAAC 7 >>>>>>>> >>>>>>>> When doing MSA, most people might rather be interested in the >>>>>>>> consensus sequence. E.g. in the context of conservation between >>>>>>>> species. >>>>>>>> >>>>>>>> So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() >>>>>>>> are quite different and BLAST doesn't make much sense for multiple >>>>>>>> alignments. This means it would be best to put the output in the >>>>>>>> PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? >>>>>>>> >>>>>>>> This is an overview of sequence alignment formats used by EMBOSS: >>>>>>>> http://emboss.sourceforge.net/docs/themes/AlignFormats.html >>>>>>>> >>>>>>>> 'pair' or 'markx0' would be perfectly fine. >>>>>>>> >>>>>>>> >>>>>>>> Cheers >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: >>>>>>>> >>>>>>>>> Hi Herv?, >>>>>>>>> >>>>>>>>> To me, the most basic and versatile MSA or pairwise alignment >>>>>>>>> format to output >>>>>>>>> to would be FASTA since it is compatible with almost any other >>>>>>>>> alignment >>>>>>>>> editing software. For text-based viewing purposes my preference >>>>>>>>> would be >>>>>>>>> to also output to a format similar to the one shown in the following >>>>>>>>> example. When there are only two sequences then one could show >>>>>>>>> instead >>>>>>>>> of a consensus line the pipe characters between the two sequences to >>>>>>>>> indicate identical residues which mimics the blast output. A more >>>>>>>>> standardized version of this pairwise alignment format can be found >>>>>>>>> here: >>>>>>>>> http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html >>>>>>>>> >>>>>>>>> library(Biostrings) >>>>>>>>> p450<- >>>>>>>>> read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/R _BioCond/Samples/p450.mul", >>>>>>>>> "fasta") >>>>>>>>> >>>>>>>>> StringSet2html<- function(msa=p450, file="p450.html", start=1, >>>>>>>>> end=length(p450[[1]]), counter=20, browser=TRUE, ...) { >>>>>>>>> if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, >>>>>>>>> end=end) >>>>>>>>> if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, >>>>>>>>> start=start, end=end) >>>>>>>>> msavec<- sapply(msa, toString) >>>>>>>>> offset<- (counter-1)-nchar(nchar(msavec[1])) >>>>>>>>> legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), >>>>>>>>> format(seq(0, >>>>>>>>> nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") >>>>>>>>> consensus<- consensusString(msavec, ambiguityMap=".", ...) >>>>>>>>> msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") >>>>>>>>> msavec<- paste(format(c("", names(msa), "Consensus"), >>>>>>>>> justify="left"), c(legend, msavec, >>>>>>>>> consensus), sep=" ") >>>>>>>>> msavec<- c("<html>
", msavec,"
</html>") >>>>>>>>> writeLines(msavec, file) >>>>>>>>> if(browser==TRUE) { browseURL(file) } >>>>>>>>> } >>>>>>>>> StringSet2html(msa=p450, file="p450.html", start=1, >>>>>>>>> end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) >>>>>>>>> StringSet2html(msa=p450, file="p450.html", start=450, end=470, >>>>>>>>> counter=20, browser=T, threshold=1.0) >>>>>>>>> >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: >>>>>>>>>> Hi Thomas, >>>>>>>>>> >>>>>>>>>> On 04/17/2012 11:49 AM, Thomas Girke wrote: >>>>>>>>>>> What about providing an option in pairwiseAlignment to output >>>>>>>>>>> to the >>>>>>>>>>> MultipleAlignment class in Biostrings and then write the latter to >>>>>>>>>>> different alignment formats? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Or we could provide coercion methods to switch between >>>>>>>>>> PairwiseAlignedXStringSet and MultipleAlignment. >>>>>>>>>> >>>>>>>>>> Anyway that kind of moves Martin's problem from having a >>>>>>>>>> write.PairwiseAlignedXStringSet() function that produces BLAST >>>>>>>>>> output >>>>>>>>>> to having a write.MultipleAlignment() function that produces BLAST >>>>>>>>>> output. For the specific case of BLAST output, would it make sense >>>>>>>>>> to support it for MultipleAlignment? Can someone point me to an >>>>>>>>>> example >>>>>>>>>> of such output? Or even better, to the specs of such format? >>>>>>>>>> >>>>>>>>>> Note that right now there is the write.phylip() function in >>>>>>>>>> Biostrings >>>>>>>>>> for writing a MultipleAlignment object to a file but the Phylip >>>>>>>>>> format >>>>>>>>>> looks very different from the BLAST output: >>>>>>>>>> >>>>>>>>>> hpages at latitude:~$ head -n 20 phylip_test.txt >>>>>>>>>> 9 2343 >>>>>>>>>> Mask 0000000000 0000000000 0000000000 0000000000 0000000000 >>>>>>>>>> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG >>>>>>>>>> Chimp ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Cow ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG >>>>>>>>>> Rat ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Dog ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC >>>>>>>>>> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG >>>>>>>>>> >>>>>>>>>> 0000000000 0000000000 0000000000 0001111111 1111111111 >>>>>>>>>> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT >>>>>>>>>> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>>>> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC >>>>>>>>>> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> H. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thomas >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >>>>>>>>>>>> Hi Martin, >>>>>>>>>>>> >>>>>>>>>>>> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>>>>>>>>>>>> Hi Charles, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks! Your solution allows to print the two alignment >>>>>>>>>>>>> strings separately. >>>>>>>>>>>>> >>>>>>>>>>>>> I was thinking of an output as generated by alignment tools: >>>>>>>>>>>>> >>>>>>>>>>>>> AGT-TCTAT >>>>>>>>>>>>> | | | | | | | | | >>>>>>>>>>>>> AGTATCTAT >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This looks like BLAST output. Is this what you have in mind? >>>>>>>>>>>> Note that >>>>>>>>>>>> there are many alignment tools and many ways to output the >>>>>>>>>>>> result to a >>>>>>>>>>>> file. I'm not really familiar with the BLAST output format. Is it >>>>>>>>>>>> specified somewhere? Would that make sense to add something >>>>>>>>>>>> like a >>>>>>>>>>>> write.PairwiseAlignedXStringSet() function to Biostrings for >>>>>>>>>>>> writing >>>>>>>>>>>> the result of pairwiseAlignment() to a file? We could do this and >>>>>>>>>>>> support the BLAST format if that's a commonly used format. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> H. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For this I would have to write a function to output the >>>>>>>>>>>>> strings in blocks of e.g. 60 nucleotides, right? >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>>>>>>>>>>>> >>>>>>>>>>>>>> write.XStringSet >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>>>> Search the archives: >>>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Hervé Pagès >>>>>>>>>>>> >>>>>>>>>>>> Program in Computational Biology >>>>>>>>>>>> Division of Public Health Sciences >>>>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>>>> P.O. Box 19024 >>>>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>>>> >>>>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>>> Search the archives: >>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Hervé Pagès >>>>>>>>>> >>>>>>>>>> Program in Computational Biology >>>>>>>>>> Division of Public Health Sciences >>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>> P.O. Box 19024 >>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>> >>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hervé Pagès >>>>>> >>>>>> Program in Computational Biology >>>>>> Division of Public Health Sciences >>>>>> Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>> P.O. Box 19024 >>>>>> Seattle, WA 98109-1024 >>>>>> >>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>> Phone: (206) 667-5791 >>>>>> Fax: (206) 667-1319 >>>>> >>>> >>> >> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi everyone, Could you please advise how to boxplot a big list (~2million *70) into a small pdf file? I did it using R boxplot and it generated a pdf file in >20MB that is hard to open. Any suggestions will be appreciated. Best, John [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi John, On 7/19/2012 3:45 PM, John linux-user wrote: > Hi everyone, > > Could you please advise how to boxplot a big list (~2million *70) into a small pdf file? I did it using R boxplot and it generated a pdf file in>20MB that is hard to open. Any suggestions will be appreciated. There are two things that come to my mind. First, do you really need to use all those data? I would bet that a sample of say 1e5 from each list item would give essentially the same picture. Alternatively, you might consider using a bitmapped graphic rather than a vector graphic. I am assuming here that you have lots of outlier points (outside the whiskers) that are slowing things down. You could use a png, and as long as you aren't resizing it should look OK. Best, Jim > > Best, > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
If the problem is tons of outliers, you could consider setting range=0, which will cause the full range of data to be included in the whiskers... Janet Smith Research Scientist Department of Biology MIT Building 68-540D Cambridge, MA 02139 ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of John linux-user [johnlinuxuser@yahoo.com] Sent: Thursday, July 19, 2012 3:45 PM To: bioconductor at r-project.org Subject: [BioC] boxplot a big list? Hi everyone, Could you please advise how to boxplot a big list (~2million *70) into a small pdf file? I did it using R boxplot and it generated a pdf file in >20MB that is hard to open. Any suggestions will be appreciated. Best, John [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
I installed 2.25.8 and it seems to work fine. I'll test it with my scripts that process sequencing data and report any problems. Am Donnerstag, 19. Juli 2012 um 19:23 schrieb Hervé Pagès: > Hi Martin, > > On 07/19/2012 04:30 AM, Martin Preusse wrote: > > Hi Herv?, > > > > thanks! This looks great. I have a question though: How can I install the latest version of Biostrings? > > > > When I install with: > > > > source("http://bioconductor.org/biocLite.R") > > biocLite("Biostrings") > > > > > > > > I get Version 2.22.0 ? ;) > > Biostrings 2.25 is part of the current devel version of BioC > (BioC 2.11). To install BioC devel, see this link: > > http://bioconductor.org/developers/useDevel/ > > Note that today Biostrings 2.25.7 is the latest version available thru > biocLite(): > > http://bioconductor.org/packages/2.11/bioc/html/Biostrings.html > > If everything goes well, 2.25.8 will become available tomorrow morning > around 10 am (Seattle time). Please let me know if you run into any > issue with this. > > Thanks! > H. > > > > > > > Cheers > > Martin > > > > > > > > Am Donnerstag, 19. Juli 2012 um 08:10 schrieb Hervé Pagès: > > > > > Hi Martin, > > > > > > On 06/22/2012 12:09 PM, Hervé Pagès wrote: > > > > Hi Martin, > > > > > > > > On 06/14/2012 06:55 AM, Martin Preusse wrote: > > > > > Hi guys, > > > > > > > > > > anything new on the sequence output? Maybe I missed something :) > > > > > please tell me if you need testing etc. > > > > > > > > > > > > > > > > > > > > > > > > Still on my list. Will work on this in the next couple of weeks. I'll > > > > let you know. Thanks for the reminder. > > > > > > > > > > > > > > > > > > There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) > > > for doing this. It produces a file in the "pair" format (as described on > > > the EMBOSS website, at the URL you sent earlier): > > > > > > > library(Biostrings) > > > > pattern <- DNAString("CGTACGTAACGTTCGT") > > > > subject <- DNAString("CGTCGTCGTCCGTAA") > > > > x1 <- pairwiseAlignment(pattern, subject) > > > > > > > > > > > > > > > > x1 > > > Global PairwiseAlignmentsSingleSubject (1 of 1) > > > pattern: [1] CGTACGTAACGTTCGT > > > subject: [1] CGT-CGT--CGTCCGT > > > score: -32.11822 > > > > > > > writePairwiseAlignments(x1, block.width=10) # 'block.width' > > > default is 50 > > > ######################################## > > > # Program: Biostrings (version 2.25.8), a Bioconductor package > > > # Rundate: Wed Jul 18 23:05:46 2012 > > > ######################################## > > > #======================================= > > > # > > > # Aligned_sequences: 2 > > > # 1: P1 > > > # 2: S1 > > > # Matrix: NA > > > # Gap_penalty: 14.0 > > > # Extend_penalty: 4.0 > > > # > > > # Length: 18 > > > # Identity: 12/18 (66.7%) > > > # Similarity: NA/18 (NA%) > > > # Gaps: 5/18 (27.8%) > > > # Score: -32.11822 > > > # > > > # > > > #======================================= > > > > > > P1 1 CGTACGTAAC 10 > > > ||| ||| | > > > S1 1 CGT-CGT--C 7 > > > > > > P1 11 GTTCGT-- 16 > > > || ||| > > > S1 8 GTCCGTAA 15 > > > > > > > > > #--------------------------------------- > > > #--------------------------------------- > > > > > > > > > Only lightly tested. Not necessarily very performant (no C code). Please > > > have a look at the man page for some caveats (especially if you plan to > > > use it on NON global alignments). Feedback welcome. > > > > > > Thanks, > > > H. > > > > > > > > > > > H. > > > > > > > > > > > > > > Cheers > > > > > Martin > > > > > > > > > > > > > > > Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: > > > > > > > > > > > Hi Herv?, > > > > > > > > > > > > thanks for your help! If you need suggestions, help or testing, just > > > > > > say the word. > > > > > > > > > > > > Will you implement the header also? If you do so, I would be thankful > > > > > > for an option like "header=F" for the output. > > > > > > > > > > > > > > > > > > Cheers > > > > > > Martin > > > > > > > > > > > > > > > > > > Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: > > > > > > > > > > > > > Thanks Martin and Thomas for the useful feedback. The 'pair' and > > > > > > > 'markx0' formats supported by Emboss seem indeed appropriate for > > > > > > > printing the output of pairwiseAlignment() to a file. I'll add > > > > > > > support for those 2 formats in Biostrings. Won't be before 1 week > > > > > > > or 2 though... > > > > > > > > > > > > > > Cheers, > > > > > > > H. > > > > > > > > > > > > > > On 04/18/2012 03:20 AM, Martin Preusse wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I just found this function to print a pairwise alignments in > > > > > > > > blocks. Doesn't add the match/mismatch indicators between > > > > > > > > sequences, but might be a starting point: > > > > > > > > > > > > > > > > http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > > > > > > > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > > > > > > > I think the output format depends on the purpose of the alignment. > > > > > > > > > > > > > > > > > > A pairwise sequence alignment is usually done to compare two > > > > > > > > > sequences base by base. In my case, I compare sequencing results > > > > > > > > > of cloned expression constructs with the desired sequence. Thus, > > > > > > > > > the best output format would be "BLAST like". > > > > > > > > > > > > > > > > > > seq1: 1 ATCTGC 7 > > > > > > > > > | | | . . | > > > > > > > > > seq2: 1 ATCAAC 7 > > > > > > > > > > > > > > > > > > When doing MSA, most people might rather be interested in the > > > > > > > > > consensus sequence. E.g. in the context of conservation between > > > > > > > > > species. > > > > > > > > > > > > > > > > > > So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() > > > > > > > > > are quite different and BLAST doesn't make much sense for multiple > > > > > > > > > alignments. This means it would be best to put the output in the > > > > > > > > > PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? > > > > > > > > > > > > > > > > > > This is an overview of sequence alignment formats used by EMBOSS: > > > > > > > > > http://emboss.sourceforge.net/docs/themes/AlignFormats.html > > > > > > > > > > > > > > > > > > 'pair' or 'markx0' would be perfectly fine. > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > > > > > > > > > > > > > > > > > > > Hi Herv?, > > > > > > > > > > > > > > > > > > > > To me, the most basic and versatile MSA or pairwise alignment > > > > > > > > > > format to output > > > > > > > > > > to would be FASTA since it is compatible with almost any other > > > > > > > > > > alignment > > > > > > > > > > editing software. For text-based viewing purposes my preference > > > > > > > > > > would be > > > > > > > > > > to also output to a format similar to the one shown in the following > > > > > > > > > > example. When there are only two sequences then one could show > > > > > > > > > > instead > > > > > > > > > > of a consensus line the pipe characters between the two sequences to > > > > > > > > > > indicate identical residues which mimics the blast output. A more > > > > > > > > > > standardized version of this pairwise alignment format can be found > > > > > > > > > > here: > > > > > > > > > > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > > > > > > > > > > > > > > > > > > > library(Biostrings) > > > > > > > > > > p450<- > > > > > > > > > > read.AAStringSet("http://faculty.ucr.edu/~tgirke/D ocuments/R_BioCond/Samples/p450.mul", > > > > > > > > > > "fasta") > > > > > > > > > > > > > > > > > > > > StringSet2html<- function(msa=p450, file="p450.html", start=1, > > > > > > > > > > end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > > > > > > > > > > if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, > > > > > > > > > > end=end) > > > > > > > > > > if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, > > > > > > > > > > start=start, end=end) > > > > > > > > > > msavec<- sapply(msa, toString) > > > > > > > > > > offset<- (counter-1)-nchar(nchar(msavec[1])) > > > > > > > > > > legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), > > > > > > > > > > format(seq(0, > > > > > > > > > > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > > > > > > > > > > consensus<- consensusString(msavec, ambiguityMap=".", ...) > > > > > > > > > > msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > > > > > > > > > > msavec<- paste(format(c("", names(msa), "Consensus"), > > > > > > > > > > justify="left"), c(legend, msavec, > > > > > > > > > > consensus), sep=" ") > > > > > > > > > > msavec<- c("<html>
", msavec,"
</html>") > > > > > > > > > > writeLines(msavec, file) > > > > > > > > > > if(browser==TRUE) { browseURL(file) } > > > > > > > > > > } > > > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=1, > > > > > > > > > > end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, > > > > > > > > > > counter=20, browser=T, threshold=1.0) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > > > > > > > > > Hi Thomas, > > > > > > > > > > > > > > > > > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > > > > > > > > > What about providing an option in pairwiseAlignment to output > > > > > > > > > > > > to the > > > > > > > > > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > > > > > > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > > > > > > > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > > > > > > > > > > > > > > > > > Anyway that kind of moves Martin's problem from having a > > > > > > > > > > > write.PairwiseAlignedXStringSet() function that produces BLAST > > > > > > > > > > > output > > > > > > > > > > > to having a write.MultipleAlignment() function that produces BLAST > > > > > > > > > > > output. For the specific case of BLAST output, would it make sense > > > > > > > > > > > to support it for MultipleAlignment? Can someone point me to an > > > > > > > > > > > example > > > > > > > > > > > of such output? Or even better, to the specs of such format? > > > > > > > > > > > > > > > > > > > > > > Note that right now there is the write.phylip() function in > > > > > > > > > > > Biostrings > > > > > > > > > > > for writing a MultipleAlignment object to a file but the Phylip > > > > > > > > > > > format > > > > > > > > > > > looks very different from the BLAST output: > > > > > > > > > > > > > > > > > > > > > > hpages at latitude:~$ head -n 20 phylip_test.txt > > > > > > > > > > > 9 2343 > > > > > > > > > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > > > > > > > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > > > > > > > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Cow ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > > > > > > > > > Rat ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Dog ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > > > > > > > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > > > > > > > > > > > > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > > > > > > > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > > > > > > > > > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > > > > > > > > > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > > > > > > > > > Hi Charles, > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks! Your solution allows to print the two alignment > > > > > > > > > > > > > > strings separately. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > > > > > > > > > > > > > > > > > AGT-TCTAT > > > > > > > > > > > > > > | | | | | | | | | > > > > > > > > > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? > > > > > > > > > > > > > Note that > > > > > > > > > > > > > there are many alignment tools and many ways to output the > > > > > > > > > > > > > result to a > > > > > > > > > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > > > > > > > > > specified somewhere? Would that make sense to add something > > > > > > > > > > > > > like a > > > > > > > > > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for > > > > > > > > > > > > > writing > > > > > > > > > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > > > > > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For this I would have to write a function to output the > > > > > > > > > > > > > > strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > > > Search the archives: > > > > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > > > > > Division of Public Health Sciences > > > > > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > > > > > P.O. Box 19024 > > > > > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > > Search the archives: > > > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > > > Division of Public Health Sciences > > > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > > > P.O. Box 19024 > > > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Hervé Pagès > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > Division of Public Health Sciences > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > P.O. Box 19024 > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > Phone: (206) 667-5791 > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Hervé Pagès > > > > > > Program in Computational Biology > > > Division of Public Health Sciences > > > Fred Hutchinson Cancer Research Center > > > 1100 Fairview Ave. N, M1-B514 > > > P.O. Box 19024 > > > Seattle, WA 98109-1024 > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > Phone: (206) 667-5791 > > > Fax: (206) 667-1319 > > > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > Phone: (206) 667-5791 > Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6