Keeping sequence names with QSutils' Collapse function
0
0
Entering edit mode
@algaraber-22855
Last seen 4.9 years ago

Is it possible to keep sequence names when haplotypes are collapsed using the Collapse tool of the QSutils package?

I have a number of fasta formatted sequences, some of which are the same, and therefore redundant for a phylogenetic analysis; since I would like to remove those that are already represented, I am using the Collapse function of QSutils to remove them. My issue is that after using the tool, the sequences have been renamed from ">Species1", ">Species2", ">Species3" etc. to "1", "2", "3"... I would like to keep the name of at least one of the sequences that have been collapsed together, rather than have them renamed to numbers. Is it possible to do this with said tool?

Example:

>example_sequences
  A DNAStringSet instance of length 5
    width seq                                                                                                              names               
[1]    18 ATTAGACACCAGAGGCTT                                                                                               Example_A
[2]    18 ATTAGACATCAGAGGCTT                                                                                               Example_B
[3]    18 ATTAGACATCAGAGGCTT                                                                                               Example_C
[4]    18 ATTAGACACCAGAGGCTT                                                                                               Example_D
[5]    18 ATTAGACACGTTAGGCTT                                                                                               Example_E


>Collapse(example_sequences)
$nr
[1] 2 2 1

$hseqs
  A DNAStringSet instance of length 3
    width seq                                                                                                              names               
[1]    18 ATTAGACACCAGAGGCTT                                                                                               1
[2]    18 ATTAGACATCAGAGGCTT                                                                                               2
[3]    18 ATTAGACACGTTAGGCTT                                                                                               3

Result of sessionInfo():

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] QSutils_1.4.0       Biostrings_2.54.0   XVector_0.26.0      IRanges_2.20.2      S4Vectors_0.24.3    BiocGenerics_0.32.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3          lattice_0.20-38     ape_5.3             psych_1.9.12.31     grid_3.6.1          nlme_3.1-143       
 [7] zlibbioc_1.32.0     tools_3.6.1         compiler_3.6.1      mnormt_1.5-6        BiocManager_1.30.10
QSutils Collapse Haplotypes • 470 views
ADD COMMENT

Login before adding your answer.

Traffic: 421 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6