Question

Extracting the findPalindrome() results

0

Entering edit mode

meyerlaker • 0

@3a6449cf

Last seen 3.9 years ago

Austria

Hello!

I am trying to findPalindromes in a whole gene sequence and to save them to then look for overlaps of these with mismatched nucleotides between two organisms for designing specific strain qPCR probes and primers. I've tried saving them as a data frame from findPalindrome() and exporting via export.fasta() from bios2mds package but the data frame only has one column with the palindromes and not the locations. Can you help me?

I am new to programming and bioinformatics so sorry if its a dumb question or an obvious answer ;-)

All the best! Vicki

findPalindrome() Palindrome • 1.1k views

ADD COMMENT • link 3.9 years ago meyerlaker • 0

score 1 · Accepted Answer · 2021-05-06

1

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 6 hours ago

Seattle, WA, United States

Hi,

I don't know anything about the bios2mds package (doesn't seem to be a Bioconductor package). Note that you don't need to turn the output of Biostrings::findPalindromes() into a data.frame, this could be very inefficient. Instead, turn it into a DNAStringSet object, add the ranges as the names of this object, and write the object to the FASTA file with Biostrings::fwriteXStringSet(). Should look something like this:

library(Biostrings)

...

pals <- findPalindromes(...)

sequences <- as(pals, "DNAStringSet")
names(sequences) <- as.character(as(pals, "IRanges"))
writeXStringSet(sequences, "path/to/file.fa")

Hope this helps.

H.

ADD COMMENT • link 3.9 years ago Hervé Pagès 16k

0

Entering edit mode

Also, you have a point: the output of Biostrings::findPalindromes() is an XStringViews object and as.data.frame() does a poor job on these objects:

pals
# Views on a 34-letter BString subject
# subject: abbbaabbcbbaccacabbbccbcaabbabacca
# views:
#       start end width
#   [1]     3   8     6 [bbaabb]
#   [2]     6  12     7 [abbcbba]
#   [3]    10  19    10 [bbaccacabb]

as.data.frame(pals)
#            x
# 1     bbaabb
# 2    abbcbba
# 3 bbaccacabb

I've just changed this in the devel version of Biostrings. Now it does:

as.data.frame(pals)
#   start end width        seq
# 1     3   8     6     bbaabb
# 2     6  12     7    abbcbba
# 3    10  19    10 bbaccacabb

This is with Biostrings 2.59.3 (part of BioC 3.13, not released yet).

Best,

H.

ADD REPLY • link 3.9 years ago Hervé Pagès 16k

0

Entering edit mode

This worked great. Thanks a lot!! And cool to see the change in the new version :-)

ADD REPLY • link 3.9 years ago meyerlaker • 0