How to combine sequences from different stringsets?
1
0
Entering edit mode
@reubenmcgregor88-13722
Last seen 3.8 years ago

I have imported 2 sets of sequences in fasta format and imported them as DNAStringSets using decipher (which I believe uses biostrings for the basic manipulation of nucleic acts string sets):

> extraemm_DNA_untrim1

A DNAStringSet instance of length 5
    width seq                                                 names               
[1]   346 GCATCCGTAGCGGTCGCTGTGGCT...AAGATGTAGAACGTCACTATCTTA EMM88 emm88.0 (em...
[2]   391 GCATCAGTAGCGGTTGCTTTGACT...AATTAGCGGATAAGCAAGAACATC EMM225 emm225.0 (...
[3]   499 GGTACTGCTTCAGTAGCGGTTGGT...AAGAAGCAGAGCAGAAAAAACTTA EMM52 emm52.0 (em...
[4]   360 GCATCCGTAGCAGTCGCTGTGGCT...GAACGCCAAAGTCAACGAGAAGTC EMM2 emm2.0 (emm-...
[5]   384 AGCAGTTGCTGTGGCTGTTTTAGG...AATAGACAAGCGTTATCAAGAACA EMM114 emm114.0 (...
> extraemm_DNA_untrim2

A DNAStringSet instance of length 5
    width seq                                                 names               
[1]   345 CATCCGTAGCGGTCGCTGTGGCTG...AAGATGTAGAACGTCACTATCTTA EMM88 emm88.0 (em...
[2]   390 CATCAGTAGCGGTTGCTTTGACTG...AATTAGCGGATAAGCAAGAACATC EMM225 emm225.0 (...
[3]   498 GTACTGCTTCAGTAGCGGTTGGTT...AAGAAGCAGAGCAGAAAAAACTTA EMM52 emm52.0 (em...
[4]   359 CATCCGTAGCAGTCGCTGTGGCTG...GAACGCCAAAGTCAACGAGAAGTC EMM2 emm2.0 (emm-...
[5]   383 GCAGTTGCTGTGGCTGTTTTAGGA...AATAGACAAGCGTTATCAAGAACA EMM114 emm114.0 (...
 

Very simply I would like to get a new DNAStringSet with sequences 1 and 2 from extraemm_DNA_untrim1  and 3, 4 and 5 from extraemm_DNA_untrim2.

I realise this is probably a very simple question, but I can not find much simple documentation on Biostrings basics and I am quite new to using it.

Thanks

biostrings decipher • 2.3k views
ADD COMMENT
2
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 3 hours ago
EMBL Heidelberg

You can select entries in a DNAStringSet using the [] notation, and you can combine DNAStringSets using c(). So if you want to select the first two entries from one, entries 3 to 5 from the other and then combine you can do:

c( extraemm_DNA_untrim1[1:2], extraemm_DNA_untrim2[3:5] )

Here's a little toy example:

> ## create two BStringSets
> bs1 <- BStringSet(c("A1", "A2", "A3", "A4"))
> bs2 <- BStringSet(c("B1", "B2", "B3", "B4"))
> 
> ## Print them out so we can see the contents
> bs1
  A BStringSet instance of length 4
    width seq
[1]     2 A1
[2]     2 A2
[3]     2 A3
[4]     2 A4
> bs2
  A BStringSet instance of length 4
    width seq
[1]     2 B1
[2]     2 B2
[3]     2 B3
[4]     2 B4
> 
> ## combine first two entries from one, last two from the other
> c(bs1[1:2], bs2[3:4])
  A BStringSet instance of length 4
    width seq
[1]     2 A1
[2]     2 A2
[3]     2 B3
[4]     2 B4
ADD COMMENT
0
Entering edit mode

Strange, I did try something similar and it combined the DNA from all the selected sequences into one long string, but I can not remember how I managed that.

Thanks for this answer

ADD REPLY

Login before adding your answer.

Traffic: 569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6