I try to read some data using DropletUtils::read10xCounts
. However, I get an error:
```{r} library(DropletUtils) sce <- DropletUtils::read10xCounts("/scratch/GRCz10.e87/") ``` Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 13 did not have 2 elements
The folder "/scratch/rulands/zebrafish_brain_christian_lange/bfx908.full_data/filtered_gene_bc_matrices/GRCz10.e87/" contains the "matrix.mtx", "genes.tsv" and "barcodes.tsv" files. However, I did not create those files myself, so I am not entirely sure whether they might be corrupted. I cannot upload the complete data and I do not understand how I could create a minimal dataset to reproduce the error. I can read "matrix.mtx" using read10xMatrix
. Does anyone know, how I can read the full data?
```{r} traceback() ```
3: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip, multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes, flush = flush, encoding = encoding, skipNul = skipNul) 2: read.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE) 1: DropletUtils::read10xCounts("/scratch/GRCz10.e87/")
```{r} BiocInstaller::biocValid() ```
[1] TRUE
```{r} sessionInfo() ```
R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: openSUSE Leap 42.3 Matrix products: default BLAS: /usr/local/R/3.5.0/lib64/R/lib/libRblas.so LAPACK: /usr/local/R/3.5.0/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] grid splines stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DropletUtils_1.0.1 pheatmap_1.0.10 [3] slingshot_0.99.6 princurve_1.1-12 [5] M3Drop_1.6.0 numDeriv_2016.8-1 [7] org.Dr.eg.db_3.6.0 biomaRt_2.36.1 [9] Rgraphviz_2.24.0 topGO_2.32.0 [11] SparseM_1.77 GO.db_3.6.0 [13] graph_1.58.0 TSCAN_1.18.0 [15] TxDb.Drerio.UCSC.danRer10.refGene_3.4.3 GenomicFeatures_1.32.0 [17] AnnotationDbi_1.42.1 stringr_1.3.1 [19] scater_1.8.0 SingleCellExperiment_1.2.0 [21] SummarizedExperiment_1.10.1 DelayedArray_0.6.0 [23] BiocParallel_1.14.1 matrixStats_0.53.1 [25] GenomicRanges_1.32.3 GenomeInfoDb_1.16.0 [27] IRanges_2.14.10 S4Vectors_0.18.2 [29] SC3_1.8.0 readxl_1.1.0 [31] monocle_2.8.0 DDRTree_0.1.5 [33] irlba_2.3.2 VGAM_1.0-5 [35] Biobase_2.40.0 BiocGenerics_0.26.0 [37] Matrix_1.2-14 magrittr_1.5 [39] Hmisc_4.1-1 ggplot2_2.2.1 [41] Formula_1.2-3 survival_2.42-3 [43] lattice_0.20-35 ggsci_2.9 [45] cluster_2.0.7-1 data.table_1.11.4 loaded via a namespace (and not attached): [1] rtracklayer_1.40.2 prabclus_2.2-6 pkgmaker_0.27 tidyr_0.8.1 [5] acepack_1.4.1 bit64_0.9-7 knitr_1.20 rpart_4.1-13 [9] RCurl_1.95-4.10 doParallel_1.0.11 RSQLite_2.1.1 RANN_2.5.1 [13] combinat_0.0-8 bit_1.1-13 phylobase_0.8.4 xml2_1.2.0 [17] httpuv_1.4.3 assertthat_0.2.0 viridis_0.5.1 tximport_1.8.0 [21] evaluate_0.10.1 promises_1.0.1 BiocInstaller_1.30.0 DEoptimR_1.0-8 [25] progress_1.1.2 caTools_1.17.1 dendextend_1.8.0 igraph_1.2.1 [29] DBI_1.0.0 htmlwidgets_1.2 sparsesvd_0.1-4 purrr_0.2.4 [33] RSpectra_0.13-1 crosstalk_1.0.0 dplyr_0.7.5 backports_1.1.2 [37] trimcluster_0.1-2 gridBase_0.4-7 locfdr_1.1-8 ROCR_1.0-7 [41] withr_2.1.2 robustbase_0.93-0 checkmate_1.8.5 GenomicAlignments_1.16.0 [45] prettyunits_1.0.2 mclust_5.4 ape_5.1 lazyeval_0.2.1 [49] edgeR_3.22.2 pkgconfig_2.0.1 slam_0.1-43 nlme_3.1-137 [53] vipor_0.4.5 nnet_7.3-12 bindr_0.1.1 rlang_0.2.0 [57] diptest_0.75-7 miniUI_0.1.1.1 registry_0.5 cellranger_1.1.0 [61] rprojroot_1.3-2 rngtools_1.3.1 Rhdf5lib_1.2.1 base64enc_0.1-3 [65] beeswarm_0.2.3 whisker_0.3-2 viridisLite_0.3.0 rjson_0.2.19 [69] bitops_1.0-6 shinydashboard_0.7.0 rncl_0.8.2 KernSmooth_2.23-15 [73] Biostrings_2.48.0 blob_1.1.1 DelayedMatrixStats_1.2.0 rgl_0.99.16 [77] doRNG_1.6.6 manipulateWidget_0.9.0 scales_0.5.0 memoise_1.1.0 [81] plyr_1.8.4 howmany_0.3-1 gplots_3.0.1 bibtex_0.4.2 [85] gdata_2.18.0 zlibbioc_1.26.0 compiler_3.5.0 HSMMSingleCell_0.114.0 [89] bbmle_1.0.20 RColorBrewer_1.1-2 rrcov_1.4-4 Rsamtools_1.32.0 [93] ade4_1.7-11 XVector_0.20.0 htmlTable_1.12 MASS_7.3-50 [97] mgcv_1.8-23 tidyselect_0.2.4 stringi_1.2.2 densityClust_0.3 [101] yaml_2.1.19 locfit_1.5-9.1 latticeExtra_0.6-28 ggrepel_0.8.0 [105] tools_3.5.0 rstudioapi_0.7 uuid_0.1-2 foreach_1.4.4 [109] foreign_0.8-70 RNeXML_2.1.1 gridExtra_2.3 Rtsne_0.13 [113] digest_0.6.15 FNN_1.1 shiny_1.1.0 qlcMatrix_0.9.7 [117] fpc_2.1-11 bindrcpp_0.2.2 Rcpp_0.12.17 later_0.7.2 [121] WriteXLS_4.0.0 httr_1.3.1 kernlab_0.9-26 colorspace_1.3-2 [125] XML_3.98-1.11 clusterExperiment_2.0.2 statmod_1.4.30 flexmix_2.3-14 [129] xtable_1.8-2 jsonlite_1.5 modeltools_0.2-21 R6_2.2.2 [133] pillar_1.2.3 htmltools_0.3.6 mime_0.5 NMF_0.21.0 [137] glue_1.2.0 class_7.3-14 codetools_0.2-15 pcaPP_1.9-73 [141] mvtnorm_1.0-7 tibble_1.4.2 ggbeeswarm_0.6.0 gtools_3.5.0 [145] limma_3.36.1 rmarkdown_1.9 docopt_0.4.5 fastICA_1.2-1 [149] munsell_0.4.3 e1071_1.6-8 rhdf5_2.24.0 GenomeInfoDbData_1.1.0 [153] iterators_1.0.9 HDF5Array_1.8.0 reshape2_1.4.3 gtable_0.2.0
Very good hint, thanks for your help! The gene name in line 13 contains a space. Maybe changing
read.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE)
toread.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE, sep = "\t")
would solve this. Now I am thinking of how to work around the issue right now. Should I rather modify the data or read it in a different way?First 20 lines of
genes.tsv
:After replacing every space in
genes.tsv
with an underscore, I can read the data just fine.Yes, that's right, or switching to
read.delim
. I have done this and pushed this to the Github repository; you can either try to install this new version, or wait for it to show up on the BioC build machines in 1-2 days. Or you can just editgenes.tsv
to get rid of the space, which probably shouldn't be there in the first place.