Hi,
What is the proper way to read in a DataFrame from a text file that has CharacterList columns? With the code below, I can see that write.table() writes the text file in such a way that the CharacterList column has c() calls. I'm guessing that there's a simple argument change or a function that then allows you to read this information, but I'm not finding it.
Thank you,
Leonardo
> library('S4Vectors') Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Attaching package: ‘S4Vectors’ The following objects are masked from ‘package:base’: colMeans, colSums, expand.grid, rowMeans, rowSums > library('GenomicRanges') Loading required package: IRanges Loading required package: GenomeInfoDb There were 12 warnings (use warnings() to see them) > df <- DataFrame(x = 1:5, y = CharacterList(lapply(1:5, function(i) { + letters[seq_len(i)]} + ))) > > write.table(df, file = 'test.tsv', sep = '\t', row.names = FALSE, quote = FALSE) > system('head test.tsv') x y 1 a 2 c("a", "b") 3 c("a", "b", "c") 4 c("a", "b", "c", "d") 5 c("a", "b", "c", "d", "e") > > df2 <- read.table('test.tsv', header = TRUE, sep = '\t', stringsAsFactors = FALSE) > df2 x y 1 1 a 2 2 c(a, b) 3 3 c(a, b, c) 4 4 c(a, b, c, d) 5 5 c(a, b, c, d, e) > > options(width = 120) > devtools::session_info() Session info ----------------------------------------------------------------------------------------------------------- setting value version R version 3.3.0 RC (2016-05-01 r70572) system x86_64, darwin13.4.0 ui AQUA language (EN) collate en_US.UTF-8 tz America/New_York date 2016-06-16 Packages --------------------------------------------------------------------------------------------------------------- package * version date source BiocGenerics * 0.19.1 2016-06-11 Bioconductor devtools 1.11.1 2016-04-21 CRAN (R 3.3.0) digest 0.6.9 2016-01-08 CRAN (R 3.3.0) GenomeInfoDb * 1.9.1 2016-05-13 Bioconductor GenomicRanges * 1.25.4 2016-06-10 Bioconductor IRanges * 2.7.6 2016-06-10 Bioconductor memoise 1.0.0 2016-01-29 CRAN (R 3.3.0) S4Vectors * 0.11.4 2016-06-11 Bioconductor withr 1.0.1 2016-02-04 CRAN (R 3.3.0) XVector 0.13.0 2016-05-05 Bioconductor zlibbioc 1.19.0 2016-05-05 Bioconductor ## Doesn't work to simply use DataFrame > DataFrame(df2) DataFrame with 5 rows and 2 columns x y <integer> <character> 1 1 a 2 2 c(a, b) 3 3 c(a, b, c) 4 4 c(a, b, c, d) 5 5 c(a, b, c, d, e)
Thanks for the info Michael. If I need to read these files, I'll use `strsplit()`.
Best,
Leonardo