Entering edit mode
I have a DNAStringSet from which I need to reverse complement some of the sequences. I found a Biostars response to a very similar question https://www.biostars.org/p/405892/ but that used sub-setting to return only the reverse complemented sequences when I would like to retain all of the sequences. I assumed I could use lapply to vectorise an ifelse function but I can't quite figure out how to make it work.
> # example DNAStringSet
> myFasta <- DNAStringSet(c("CCGAAAACCATGATGGTTGCCAG", "TTACACTCTTTCCCTACACGACGCTC", "TTCCGATCTGTAAAACGACGGCCAGT"))
> # vector of which sequences I would like to be reverse complemented
> myFasta_rc <- c(TRUE, FALSE, TRUE)
> # attempt to lapply ifelse across the DNAStringSet
> lapply(myFasta, function(x) {ifelse(myFasta_rc[] == "TRUE", reverseComplement(myFasta[x]), myFasta[x])})
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'reverseComplement': invalid subscript
> sessionInfo( )
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.70.2
[3] rtracklayer_1.62.0 BiocIO_1.12.0
[5] lubridate_1.9.3 forcats_1.0.0
[7] stringr_1.5.1 dplyr_1.1.4
[9] purrr_1.0.2 readr_2.1.5
[11] tidyr_1.3.1 tibble_3.2.1
[13] ggplot2_3.5.1 tidyverse_2.0.0
[15] ShortRead_1.60.0 GenomicAlignments_1.38.2
[17] SummarizedExperiment_1.32.0 Biobase_2.62.0
[19] MatrixGenerics_1.14.0 matrixStats_1.4.1
[21] Rsamtools_2.18.0 GenomicRanges_1.54.1
[23] Biostrings_2.70.3 GenomeInfoDb_1.38.8
[25] XVector_0.42.0 IRanges_2.36.0
[27] S4Vectors_0.40.2 BiocGenerics_0.48.1
[29] BiocManager_1.30.25 BiocParallel_1.36.0
loaded via a namespace (and not attached):
[1] gtable_0.3.5 rjson_0.2.23 hwriter_1.3.2.1
[4] latticeExtra_0.6-30 lattice_0.22-5 tzdb_0.4.0
[7] vctrs_0.6.5 tools_4.3.3 bitops_1.0-8
[10] generics_0.1.3 parallel_4.3.3 fansi_1.0.6
[13] pkgconfig_2.0.3 Matrix_1.6-5 RColorBrewer_1.1-3
[16] lifecycle_1.0.4 GenomeInfoDbData_1.2.11 compiler_4.3.3
[19] deldir_2.0-4 munsell_0.5.1 codetools_0.2-20
[22] yaml_2.3.10 RCurl_1.98-1.16 pillar_1.9.0
[25] crayon_1.5.3 DelayedArray_0.28.0 abind_1.4-8
[28] tidyselect_1.2.1 stringi_1.8.4 restfulr_0.0.15
[31] grid_4.3.3 colorspace_2.1-1 cli_3.6.3
[34] SparseArray_1.2.4 magrittr_2.0.3 S4Arrays_1.2.1
[37] XML_3.99-0.17 utf8_1.2.4 withr_3.0.1
[40] scales_1.3.0 timechange_0.3.0 jpeg_0.1-10
[43] interp_1.1-6 png_0.1-8 hms_1.1.3
[46] rlang_1.1.4 Rcpp_1.0.13 glue_1.8.0
[49] rstudioapi_0.16.0 R6_2.5.1 zlibbioc_1.48.2
Fantastic, thank you