Entering edit mode
Alogmail2@aol.com
▴
90
@alogmail2aolcom-4540
Last seen 10.0 years ago
Hi Martin,
Thanks for the clarification. Does the variant of
pattern="DNAString",
subject="DNAString"
look more realistic to align DNA sequences?
One more question.
What pairwise alignment tool would you recommend if I need to align a
list#1 of 50-mer probes with a list#2 of more longer contigs (hundreds
of
bases):
I take each 50-mer probe from the list#1 and align it to each contig
from
the list#2
Finally, for each 50-mer probe I need to find the best match in some
terms
e.g. in terms of alignment scores.
I have tried pairwiseAlignment() but it works slowly and looks
unfeasible
for the list# 1 of 16798 probes vs. the list#2 of 14338 contigs, i.e.
when
output matrix should be 16798 by 14338.
I have looked at the mailing list and found suggestions to apply
matchPattern/vmatchPattern but they provide the fact of a match for a
substring
within a string rather than alignment score to compare alignments.
Thanks
Best regards,
Alex
In a message dated 3/14/2011 9:04:04 A.M. Pacific Daylight Time,
mtmorgan@fhcrc.org writes:
On 03/13/2011 06:58 PM, Alogmail2@aol.com wrote:
> Dear Mailing List,
>
> Why does pairwiseAlignment() generate different outcomes for the
same
input
> sequences defined differently in terms of classes (see
showMethods
below):
>
> for
>
> pattern="character", subject="character"
Hi Alex --
here 'character' could be anything -- 'the quick brown fox', whereas
>
> vs.
>
> pattern="DNAString", subject="DNAString"
here pattern and subject are drawn from a restricted alphabet and
some
symbols have particular meaning (e.g., 'N' does not mean the
nucleotide
'N').
Martin
>
> ?
>
> It generates the same outcomes for the case of
>
>
> pattern="character", subject="character"
>
> vs.
>
> pattern="character", subject="DNAString"
>
>
> It looks like a bug.
>
> Thanks
>
> Alex
>
>
>
>
>
>
> #showMethods("pairwiseAlignment")
> #Function: pairwiseAlignment (package Biostrings)
> #pattern="character", subject="character"
> #pattern="character", subject="DNAString"
> # (inherited from: pattern="character", subject="XString")
> #pattern="DNAString", subject="DNAString"
>
>> pattern.1
> 50-letter "DNAString" instance
> seq: CTGCCATGGCAAAGCTCGCTGCCTCAGAGGCCGCCACAATGGTTGCGCAC
>> pattern.2
> [1] "CTGCCATGGCAAAGCTCGCTGCCTCAGAGGCCGCCACAATGGTTGCGCAC"
>> subject.1
> 543-letter "DNAString" instance
> seq:
>
AAAAAAAAAAAAAAAAAAAAATGAAATCCGAACTTCTTGGAGCCTCGTCTGAAGGCCATCTCGGCTCTTC
AGATT...CAGTCAACAAGTTCCAACGGGACTAGATTACGGGGCGTATACGCCGTACGGCCAGGCCAGTA
GTTACG
> CGTCGT
>> subject.2
> [1]
>
"AAAAAAAAAAAAAAAAAAAAATGAAATCCGAACTTCTTGGAGCCTCGTCTGAAGGCCATCTCGGCTCTT
CAGATTCGCCCTTCGTCAGCGACGCTCTGGCTGCCGTCACCGGTGACTACCAATCGGCCTACGCTGCTTC
CTATTAC
>
AGCAGCGCGATGCAGGCCTACAATAGTCAATCGACGTCGGCCTACATGCCAAGCAGTGGATTCTATAATG
GCGCAT
>
CTTCGCAGACGCCCTACGGAGTCCTGGCGCCCTCCACTTACACAACGATGGGCGTTCCCAGTACAAGAGG
TTTAGG
>
CCAACAATGTAAAAATGGACAATCATTAGCACAAACGCCTCCGTATTTGAGCTCGTACGGGTCGGCATTC
GGTGGT
>
GTCACAGCCAGCAGTTCGCCTTCGGGTCCACCCGCCTACGCGTCCGCTTATGGATCGGCATACAATAGCG
CCACCG
>
CCGCCCAATCGTTCACCAACAGTCAACAAGTTCCAACGGGACTAGATTACGGGGCGTATACGCCGTACGG
CCAGGC
> CAGTAGTTACGCGTCGT"
>>
>
>>
>
pairwiseAlignment(pattern=pattern.1,subject=subject.1,type="global-
local")
> Global-Local PairwiseAlignedFixedSubject (1 of 1)
> pattern: [1] CTGCCATGGCAAAGCTCGCTGCCTCAGAGGCCGCCACAATGGTTGCGCAC
> subject: [429] TCGGCATACAATAGC--GCCACC------GCCGCC-CAATCGTT---CAC
> score: -91.50367
>>
>
pairwiseAlignment(pattern=pattern.2,subject=subject.2,type="global-
local")
> Global-Local PairwiseAlignedFixedSubject (1 of 1)
> pattern: [1] CTGC--CATGGCAAAGCTC--GCTGCC-TCAGAGGCCGCCAC-
AATGGTTGCGCAC
> subject: [80] CTTCGTCA--GCGACGCTCTGGCTGCCGTCACCGGTGACTACCAATCG--
GCCTAC
> score: 95.69296
>>
>
pairwiseAlignment(pattern=pattern.2,subject=subject.1,type="global-
local")
> Global-Local PairwiseAlignedFixedSubject (1 of 1)
> pattern: [1] CTGCCATGGCAAAGCTCGCTGCCTCAGAGGCCGCCACAATGGTTGCGCAC
> subject: [429] TCGGCATACAATAGC--GCCACC------GCCGCC-CAATCGTT---CAC
> score: -91.50367
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods
base
>
> other attached packages:
> [1] altcdfenvs_2.12.0 hypergraph_1.22.0 graph_1.28.0
> makecdfenv_1.28.0 affy_1.28.0 Biobase_2.10.0 GeneR_2.20.0
seqinr_3.0-1
> [9] Biostrings_2.18.4 IRanges_1.8.9 limma_3.6.9
>
> loaded via a namespace (and not attached):
> [1] affyio_1.18.0 preprocessCore_1.12.0 tools_2.12.2
>
>
>
>
> ###############################
>
> Nutritional Sciences and Toxicology,
> 119 Morgan Hall
> UC.Berkeley,CA 94720
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
[[alternative HTML version deleted]]