Entering edit mode
Hi Martin,
On 06/22/2012 12:09 PM, Hervé Pagès wrote:
> Hi Martin,
>
> On 06/14/2012 06:55 AM, Martin Preusse wrote:
>> Hi guys,
>>
>> anything new on the sequence output? Maybe I missed something :)
>> please tell me if you need testing etc.
>
> Still on my list. Will work on this in the next couple of weeks.
I'll
> let you know. Thanks for the reminder.
There is now a writePairwiseAlignments() function (in Biostrings
2.25.8)
for doing this. It produces a file in the "pair" format (as described
on
the EMBOSS website, at the URL you sent earlier):
> library(Biostrings)
> pattern <- DNAString("CGTACGTAACGTTCGT")
> subject <- DNAString("CGTCGTCGTCCGTAA")
> x1 <- pairwiseAlignment(pattern, subject)
> x1
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] CGTACGTAACGTTCGT
subject: [1] CGT-CGT--CGTCCGT
score: -32.11822
> writePairwiseAlignments(x1, block.width=10) # 'block.width'
default is 50
########################################
# Program: Biostrings (version 2.25.8), a Bioconductor package
# Rundate: Wed Jul 18 23:05:46 2012
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 18
# Identity: 12/18 (66.7%)
# Similarity: NA/18 (NA%)
# Gaps: 5/18 (27.8%)
# Score: -32.11822
#
#
#=======================================
P1 1 CGTACGTAAC 10
||| ||| |
S1 1 CGT-CGT--C 7
P1 11 GTTCGT-- 16
|| |||
S1 8 GTCCGTAA 15
#---------------------------------------
#---------------------------------------
Only lightly tested. Not necessarily very performant (no C code).
Please
have a look at the man page for some caveats (especially if you plan
to
use it on NON global alignments). Feedback welcome.
Thanks,
H.
>
> H.
>
>>
>> Cheers
>> Martin
>>
>>
>> Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse:
>>
>>> Hi Herv?,
>>>
>>> thanks for your help! If you need suggestions, help or testing,
just
>>> say the word.
>>>
>>> Will you implement the header also? If you do so, I would be
thankful
>>> for an option like "header=F" for the output.
>>>
>>>
>>> Cheers
>>> Martin
>>>
>>>
>>> Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès:
>>>
>>>> Thanks Martin and Thomas for the useful feedback. The 'pair' and
>>>> 'markx0' formats supported by Emboss seem indeed appropriate for
>>>> printing the output of pairwiseAlignment() to a file. I'll add
>>>> support for those 2 formats in Biostrings. Won't be before 1 week
>>>> or 2 though...
>>>>
>>>> Cheers,
>>>> H.
>>>>
>>>> On 04/18/2012 03:20 AM, Martin Preusse wrote:
>>>>> Hi,
>>>>>
>>>>> I just found this function to print a pairwise alignments in
>>>>> blocks. Doesn't add the match/mismatch indicators between
>>>>> sequences, but might be a starting point:
>>>>>
>>>>> http://a-little-book-of-r-for-
bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a
-long-pairwise-alignment
>>>>>
>>>>>
>>>>>
>>>>> Cheers
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse:
>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> I think the output format depends on the purpose of the
alignment.
>>>>>>
>>>>>> A pairwise sequence alignment is usually done to compare two
>>>>>> sequences base by base. In my case, I compare sequencing
results
>>>>>> of cloned expression constructs with the desired sequence.
Thus,
>>>>>> the best output format would be "BLAST like".
>>>>>>
>>>>>> seq1: 1 ATCTGC 7
>>>>>> | | | . . |
>>>>>> seq2: 1 ATCAAC 7
>>>>>>
>>>>>> When doing MSA, most people might rather be interested in the
>>>>>> consensus sequence. E.g. in the context of conservation between
>>>>>> species.
>>>>>>
>>>>>> So write.PairwiseAlignedXStringSet() and
write.MultipleAlignment()
>>>>>> are quite different and BLAST doesn't make much sense for
multiple
>>>>>> alignments. This means it would be best to put the output in
the
>>>>>> PairwiseAlignment/MultipleAlignment and not to the XStringSet,
right?
>>>>>>
>>>>>> This is an overview of sequence alignment formats used by
EMBOSS:
>>>>>> http://emboss.sourceforge.net/docs/themes/AlignFormats.html
>>>>>>
>>>>>> 'pair' or 'markx0' would be perfectly fine.
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke:
>>>>>>
>>>>>>> Hi Herv?,
>>>>>>>
>>>>>>> To me, the most basic and versatile MSA or pairwise alignment
>>>>>>> format to output
>>>>>>> to would be FASTA since it is compatible with almost any other
>>>>>>> alignment
>>>>>>> editing software. For text-based viewing purposes my
preference
>>>>>>> would be
>>>>>>> to also output to a format similar to the one shown in the
following
>>>>>>> example. When there are only two sequences then one could show
>>>>>>> instead
>>>>>>> of a consensus line the pipe characters between the two
sequences to
>>>>>>> indicate identical residues which mimics the blast output. A
more
>>>>>>> standardized version of this pairwise alignment format can be
found
>>>>>>> here:
>>>>>>> http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html
>>>>>>>
>>>>>>> library(Biostrings)
>>>>>>> p450<-
>>>>>>> read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/R_B
ioCond/Samples/p450.mul",
>>>>>>> "fasta")
>>>>>>>
>>>>>>> StringSet2html<- function(msa=p450, file="p450.html", start=1,
>>>>>>> end=length(p450[[1]]), counter=20, browser=TRUE, ...) {
>>>>>>> if(class(msa)=="AAStringSet") msa<- AAStringSet(msa,
start=start,
>>>>>>> end=end)
>>>>>>> if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa,
>>>>>>> start=start, end=end)
>>>>>>> msavec<- sapply(msa, toString)
>>>>>>> offset<- (counter-1)-nchar(nchar(msavec[1]))
>>>>>>> legend<- paste(paste(paste(paste(rep(" ", offset),
collapse=""),
>>>>>>> format(seq(0,
>>>>>>> nchar(msavec[1]), by=counter)[-1])), collapse=""),
collapse="")
>>>>>>> consensus<- consensusString(msavec, ambiguityMap=".", ...)
>>>>>>> msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep="
")
>>>>>>> msavec<- paste(format(c("", names(msa), "Consensus"),
>>>>>>> justify="left"), c(legend, msavec,
>>>>>>> consensus), sep=" ")
>>>>>>> msavec<- c("<html>
", msavec,"</html>") >>>>>>> writeLines(msavec, file) >>>>>>> if(browser==TRUE) { browseURL(file) } >>>>>>> } >>>>>>> StringSet2html(msa=p450, file="p450.html", start=1, >>>>>>> end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) >>>>>>> StringSet2html(msa=p450, file="p450.html", start=450, end=470, >>>>>>> counter=20, browser=T, threshold=1.0) >>>>>>> >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: >>>>>>>> Hi Thomas, >>>>>>>> >>>>>>>> On 04/17/2012 11:49 AM, Thomas Girke wrote: >>>>>>>>> What about providing an option in pairwiseAlignment to output >>>>>>>>> to the >>>>>>>>> MultipleAlignment class in Biostrings and then write the latter to >>>>>>>>> different alignment formats? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Or we could provide coercion methods to switch between >>>>>>>> PairwiseAlignedXStringSet and MultipleAlignment. >>>>>>>> >>>>>>>> Anyway that kind of moves Martin's problem from having a >>>>>>>> write.PairwiseAlignedXStringSet() function that produces BLAST >>>>>>>> output >>>>>>>> to having a write.MultipleAlignment() function that produces BLAST >>>>>>>> output. For the specific case of BLAST output, would it make sense >>>>>>>> to support it for MultipleAlignment? Can someone point me to an >>>>>>>> example >>>>>>>> of such output? Or even better, to the specs of such format? >>>>>>>> >>>>>>>> Note that right now there is the write.phylip() function in >>>>>>>> Biostrings >>>>>>>> for writing a MultipleAlignment object to a file but the Phylip >>>>>>>> format >>>>>>>> looks very different from the BLAST output: >>>>>>>> >>>>>>>> hpages at latitude:~$ head -n 20 phylip_test.txt >>>>>>>> 9 2343 >>>>>>>> Mask 0000000000 0000000000 0000000000 0000000000 0000000000 >>>>>>>> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG >>>>>>>> Chimp ---------- ---------- ---------- ---------- ---------- >>>>>>>> Cow ---------- ---------- ---------- ---------- ---------- >>>>>>>> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG >>>>>>>> Rat ---------- ---------- ---------- ---------- ---------- >>>>>>>> Dog ---------- ---------- ---------- ---------- ---------- >>>>>>>> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC >>>>>>>> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG >>>>>>>> >>>>>>>> 0000000000 0000000000 0000000000 0001111111 1111111111 >>>>>>>> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT >>>>>>>> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC >>>>>>>> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT >>>>>>>> >>>>>>>> Thanks! >>>>>>>> H. >>>>>>>> >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >>>>>>>>>> Hi Martin, >>>>>>>>>> >>>>>>>>>> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>>>>>>>>>> Hi Charles, >>>>>>>>>>> >>>>>>>>>>> thanks! Your solution allows to print the two alignment >>>>>>>>>>> strings separately. >>>>>>>>>>> >>>>>>>>>>> I was thinking of an output as generated by alignment tools: >>>>>>>>>>> >>>>>>>>>>> AGT-TCTAT >>>>>>>>>>> | | | | | | | | | >>>>>>>>>>> AGTATCTAT >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This looks like BLAST output. Is this what you have in mind? >>>>>>>>>> Note that >>>>>>>>>> there are many alignment tools and many ways to output the >>>>>>>>>> result to a >>>>>>>>>> file. I'm not really familiar with the BLAST output format. Is it >>>>>>>>>> specified somewhere? Would that make sense to add something >>>>>>>>>> like a >>>>>>>>>> write.PairwiseAlignedXStringSet() function to Biostrings for >>>>>>>>>> writing >>>>>>>>>> the result of pairwiseAlignment() to a file? We could do this and >>>>>>>>>> support the BLAST format if that's a commonly used format. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> H. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For this I would have to write a function to output the >>>>>>>>>>> strings in blocks of e.g. 60 nucleotides, right? >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>>>>>>>>>> >>>>>>>>>>>> write.XStringSet >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: >>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Hervé Pagès >>>>>>>>>> >>>>>>>>>> Program in Computational Biology >>>>>>>>>> Division of Public Health Sciences >>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>> P.O. Box 19024 >>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>> >>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Hervé Pagès >>>>>>>> >>>>>>>> Program in Computational Biology >>>>>>>> Division of Public Health Sciences >>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>> P.O. Box 19024 >>>>>>>> Seattle, WA 98109-1024 >>>>>>>> >>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>> Phone: (206) 667-5791 >>>>>>>> Fax: (206) 667-1319 >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>> >> >> >> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319