Question

Extracting Sequences from DNAString

0

Entering edit mode

eulerphi8 • 0

@eulerphi8-8171

Last seen 9.9 years ago

United States

Sorry for the stupid question but I'm stuck on a really stupid problem. I'm trying to output the results of some analysis and want a column of the DNA sequences stored in a DNAString. For example, the example below, I can use start(DNAString), end(DNAString), and width(DNAString) to extract the start, end, and width columns, but how do I extract the last column, which contains the actual DNA sequences? Again, sorry for the stupid question and thank you in advance for any help.

Views on a 249250621-letter DNAString subject
subject: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
start end width
[1] 1534983 1534990 8 [ACGTCGGG]
[2] 2017082 2017089 8 [ACGTCGGG]
[3] 2017138 2017145 8 [ACGTCGGG]
[4] 2017194 2017201 8 [ACGTCGGG]
[5] 2946138 2946145 8 [ACGTCGGG]
... ... ... ... ...
[121] 245722047 245722054 8 [ACGTCGGG]
[122] 245752365 245752372 8 [ACGTCGGG]
[123] 246954563 246954570 8 [ACGTCGGG]
[124] 247282857 247282864 8 [ACGTCGGG]
[125] 249239868 249239875 8 [ACGTCGGG]

dnastring • 1.1k views

ADD COMMENT • link updated 9.9 years ago by James W. MacDonald 68k • written 9.9 years ago by eulerphi8 • 0

score 2 · Accepted Answer · 2015-06-15

That's not a DNAStringSet. It's a Views() on a DNAStringSet. Regardless, you want as.character().

> d <- DNAString("TTGTCAATGTGC")
> dd2 <- Views(d, start = 3:1, end = 5:7)
> dd2
  Views on a 12-letter DNAString subject
subject: TTGTCAATGTGC
views:
    start end width
[1]     3   5     3 [GTC]
[2]     2   6     5 [TGTCA]
[3]     1   7     7 [TTGTCAA]
> as.character(dd2)
[1] "GTC"     "TGTCA"   "TTGTCAA"
> as.character(d)
[1] "TTGTCAATGTGC"