Hi Herv?,
To me, the most basic and versatile MSA or pairwise alignment format
to output
to would be FASTA since it is compatible with almost any other
alignment
editing software. For text-based viewing purposes my preference would
be
to also output to a format similar to the one shown in the following
example. When there are only two sequences then one could show instead
of a consensus line the pipe characters between the two sequences to
indicate identical residues which mimics the blast output. A more
standardized version of this pairwise alignment format can be found
here:
http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html
library(Biostrings)
p450 <- read.AAStringSet("
http://faculty.ucr.edu/~tgirke/Documents/R_B
ioCond/Samples/p450.mul", "fasta")
StringSet2html <- function(msa=p450, file="p450.html", start=1,
end=length(p450[[1]]), counter=20, browser=TRUE, ...) {
if(class(msa)=="AAStringSet") msa <- AAStringSet(msa,
start=start, end=end)
if(class(msa)=="DNAStringSet") msa <- DNAStringSet(msa,
start=start, end=end)
msavec <- sapply(msa, toString)
offset <- (counter-1)-nchar(nchar(msavec[1]))
legend <- paste(paste(paste(paste(rep(" ", offset),
collapse=""), format(seq(0,
nchar(msavec[1]), by=counter)[-1])), collapse=""),
collapse="")
consensus <- consensusString(msavec, ambiguityMap=".", ...)
msavec <- paste(msavec, rowSums(as.matrix(msa) != "-"), sep="
")
msavec <- paste(format(c("", names(msa), "Consensus"),
justify="left"), c(legend, msavec,
consensus), sep=" ")
msavec <- c("<html>
", msavec, "
</html>")
writeLines(msavec, file)
if(browser==TRUE) { browseURL(file) }
}
StringSet2html(msa=p450, file="p450.html", start=1,
end=length(p450[[1]]), counter=20, browser=T, threshold=1.0)
StringSet2html(msa=p450, file="p450.html", start=450, end=470,
counter=20, browser=T, threshold=1.0)
Thomas
On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote:
> Hi Thomas,
>
> On 04/17/2012 11:49 AM, Thomas Girke wrote:
> > What about providing an option in pairwiseAlignment to output to
the
> > MultipleAlignment class in Biostrings and then write the latter to
> > different alignment formats?
>
> Or we could provide coercion methods to switch between
> PairwiseAlignedXStringSet and MultipleAlignment.
>
> Anyway that kind of moves Martin's problem from having a
> write.PairwiseAlignedXStringSet() function that produces BLAST
output
> to having a write.MultipleAlignment() function that produces BLAST
> output. For the specific case of BLAST output, would it make sense
> to support it for MultipleAlignment? Can someone point me to an
example
> of such output? Or even better, to the specs of such format?
>
> Note that right now there is the write.phylip() function in
Biostrings
> for writing a MultipleAlignment object to a file but the Phylip
format
> looks very different from the BLAST output:
>
> hpages at latitude:~$ head -n 20 phylip_test.txt
> 9 2343
> Mask 0000000000 0000000000 0000000000 0000000000 0000000000
> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG
> Chimp ---------- ---------- ---------- ---------- ----------
> Cow ---------- ---------- ---------- ---------- ----------
> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG
> Rat ---------- ---------- ---------- ---------- ----------
> Dog ---------- ---------- ---------- ---------- ----------
> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC
> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG
>
> 0000000000 0000000000 0000000000 0001111111 1111111111
> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT
> ---------- ---------- ---------- ---------- ---A-TGGCT
> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT
> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT
> ---------- ---------- ---------- ---------- ---A-TGGCT
> ---------- ---------- ---------- ---------- ---A-TGGCT
> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC
> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT
>
> Thanks!
> H.
>
> >
> > Thomas
> >
> > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote:
> >> Hi Martin,
> >>
> >> On 04/16/2012 04:06 AM, Martin Preusse wrote:
> >>> Hi Charles,
> >>>
> >>> thanks! Your solution allows to print the two alignment strings
separately.
> >>>
> >>> I was thinking of an output as generated by alignment tools:
> >>>
> >>> AGT-TCTAT
> >>> | | | | | | | | |
> >>> AGTATCTAT
> >>
> >> This looks like BLAST output. Is this what you have in mind? Note
that
> >> there are many alignment tools and many ways to output the result
to a
> >> file. I'm not really familiar with the BLAST output format. Is it
> >> specified somewhere? Would that make sense to add something like
a
> >> write.PairwiseAlignedXStringSet() function to Biostrings for
writing
> >> the result of pairwiseAlignment() to a file? We could do this and
> >> support the BLAST format if that's a commonly used format.
> >>
> >> Thanks,
> >> H.
> >>
> >>>
> >>> For this I would have to write a function to output the strings
in blocks of e.g. 60 nucleotides, right?
> >>>
> >>> Cheers
> >>> Martin
> >>>
> >>>
> >>>
> >>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles:
> >>>
> >>>> write.XStringSet
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at r-project.org
> >>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >> --
> >> Hervé Pagès
> >>
> >> Program in Computational Biology
> >> Division of Public Health Sciences
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N, M1-B514
> >> P.O. Box 19024
> >> Seattle, WA 98109-1024
> >>
> >> E-mail: hpages at fhcrc.org
> >> Phone: (206) 667-5791
> >> Fax: (206) 667-1319
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319