Problems with iteration (sappily) over RNAStringSet
Entering edit mode
Kemal Akat ▴ 120
Last seen 9.8 years ago
Hi, I want to iterate over an RNAStringSet (rs) to do a calculation for each of the sequences in the form of: 1) get the sequence 2) do the calculations 3) plot the results and 4) use the sequence name (names(rs) in plot legends and titles, e.g. plot(x, main = paste(sequence_name, 'in condition X'), sep = ' '). The name I want to use is the first field from the FASTA description, and I don't want to use the other information. However, the extraction of the name does not work as assumed. The input FASTA file looks like this: > Gene1 Description UUUUUUUUUUUUUUUUUUUUUUU > Gene2 Description AAAAAAAAAAAAAAAAAAAAAAA > Gene3 Description GGGGGGGGGGGGGGGGGGGGGGG > Gene4 Description CCCCCCCCCCCCCCCCCCCCCCC library("Biostrings") rs = read.RNAStringSet('test.fa') R> rs A RNAStringSet instance of length 4 width seq names [1] 23 UUUUUUUUUUUUUUUUUUUUUUU Gene1 Description [2] 23 AAAAAAAAAAAAAAAAAAAAAAA Gene2 Description [3] 23 GGGGGGGGGGGGGGGGGGGGGGG Gene3 Description [4] 23 CCCCCCCCCCCCCCCCCCCCCCC Gene4 Description The following commands return what I was expecting: R> strsplit(names(rs), split = ' ')[[1]][1] [1] "Gene1" R> strsplit(toString(rs), split = ',')[[1]][1] [1] "UUUUUUUUUUUUUUUUUUUUUUU" To iterate I wrote this function: myFun = function(x){ name = strsplit(names(x), split = ' ')[[1]][1] seq = strsplit(toString(x), split = ',')[[1]][1] names(seq) = name return(seq) } However, this returns an error: R> myFun = function(x){ + name = strsplit(names(x), split = ' ')[[1]][1] + seq = strsplit(toString(x), split = ',')[[1]][1] + names(seq) = name + return(seq) + } R> sapply(y, myFun) Error in strsplit(names(x), split = " ") : non-character argument Calls: sapply ... lapply -> lapply -> lapply -> FUN -> FUN -> strsplit Simplyfing the function to R> myFun = function(x){ + seq = strsplit(toString(x), split = ',')[[1]][1] + } Returns the full sequence names as entered in the original FASTA file. R> sapply(rs, myFun) Gene1 Description Gene2 Description Gene3 Description "UUUUUUUUUUUUUUUUUUUUUUU" "AAAAAAAAAAAAAAAAAAAAAAA" "GGGGGGGGGGGGGGGGGGGGGGG" Gene4 Description "CCCCCCCCCCCCCCCCCCCCCCC" I would appreciate if anyone could offer a solution or explain why the strsplit does not work with the looping (sapply)? Thank you! Kemal R> sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] illuminaHumanv4.db_1.14.0 [3] RSQLite_0.11.1 DBI_0.2-5 [5] AnnotationDbi_1.18.1 beadarray_2.6.0 [7] Biobase_2.16.0 ShortRead_1.14.4 [9] latticeExtra_0.6-19 RColorBrewer_1.0-5 [11] Rsamtools_1.8.5 lattice_0.20-6 [13] GenomicRanges_1.8.7 ggplot2_0.9.1 [15] edgeR_2.6.7 limma_3.12.1 [17] Biostrings_2.24.1 IRanges_1.14.3 [19] BiocGenerics_0.2.0 colorout_0.9-9 loaded via a namespace (and not attached): [1] BeadDataPackR_1.8.0 bitops_1.0-4.1 colorspace_1.1-1 [4] dichromat_1.2-4 digest_0.5.2 grid_2.15.0 [7] hwriter_1.3 labeling_0.1 MASS_7.3-18 [10] memoise_0.1 munsell_0.3 plyr_1.7.1 [13] proto_0.3-9.2 reshape2_1.2.1 scales_0.2.1 [16] stats4_2.15.0 stringr_0.6 tools_2.15.0 [19] zlibbioc_1.2.0

