Question

Error when reading data with DropletUtils::read10xCounts

0

Entering edit mode

fabrost • 0

@fabrost-15946

Last seen 4.6 years ago

I try to read some data using DropletUtils::read10xCounts. However, I get an error:

```{r}
library(DropletUtils)
sce <- DropletUtils::read10xCounts("/scratch/GRCz10.e87/")
```

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 13 did not have 2 elements

The folder "/scratch/rulands/zebrafish_brain_christian_lange/bfx908.full_data/filtered_gene_bc_matrices/GRCz10.e87/" contains the "matrix.mtx", "genes.tsv" and "barcodes.tsv" files. However, I did not create those files myself, so I am not entirely sure whether they might be corrupted. I cannot upload the complete data and I do not understand how I could create a minimal dataset to reproduce the error. I can read "matrix.mtx" using read10xMatrix. Does anyone know, how I can read the full data?

```{r}
traceback()
```

3: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
       nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
       fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
       multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
       flush = flush, encoding = encoding, skipNul = skipNul)
2: read.table(gene.loc, header = FALSE, colClasses = "character",
       stringsAsFactors = FALSE)
1: DropletUtils::read10xCounts("/scratch/GRCz10.e87/")

```{r}
BiocInstaller::biocValid()
```

[1] TRUE

```{r}
sessionInfo()
```

R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 42.3

Matrix products: default
BLAS: /usr/local/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C              LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8    LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C            LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
 [1] grid      splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DropletUtils_1.0.1                      pheatmap_1.0.10                        
 [3] slingshot_0.99.6                        princurve_1.1-12                       
 [5] M3Drop_1.6.0                            numDeriv_2016.8-1                      
 [7] org.Dr.eg.db_3.6.0                      biomaRt_2.36.1                         
 [9] Rgraphviz_2.24.0                        topGO_2.32.0                           
[11] SparseM_1.77                            GO.db_3.6.0                            
[13] graph_1.58.0                            TSCAN_1.18.0                           
[15] TxDb.Drerio.UCSC.danRer10.refGene_3.4.3 GenomicFeatures_1.32.0                 
[17] AnnotationDbi_1.42.1                    stringr_1.3.1                          
[19] scater_1.8.0                            SingleCellExperiment_1.2.0             
[21] SummarizedExperiment_1.10.1             DelayedArray_0.6.0                     
[23] BiocParallel_1.14.1                     matrixStats_0.53.1                     
[25] GenomicRanges_1.32.3                    GenomeInfoDb_1.16.0                    
[27] IRanges_2.14.10                         S4Vectors_0.18.2                       
[29] SC3_1.8.0                               readxl_1.1.0                           
[31] monocle_2.8.0                           DDRTree_0.1.5                          
[33] irlba_2.3.2                             VGAM_1.0-5                             
[35] Biobase_2.40.0                          BiocGenerics_0.26.0                    
[37] Matrix_1.2-14                           magrittr_1.5                           
[39] Hmisc_4.1-1                             ggplot2_2.2.1                          
[41] Formula_1.2-3                           survival_2.42-3                        
[43] lattice_0.20-35                         ggsci_2.9                              
[45] cluster_2.0.7-1                         data.table_1.11.4                      

loaded via a namespace (and not attached):
  [1] rtracklayer_1.40.2       prabclus_2.2-6           pkgmaker_0.27            tidyr_0.8.1             
  [5] acepack_1.4.1            bit64_0.9-7              knitr_1.20               rpart_4.1-13            
  [9] RCurl_1.95-4.10          doParallel_1.0.11        RSQLite_2.1.1            RANN_2.5.1              
 [13] combinat_0.0-8           bit_1.1-13               phylobase_0.8.4          xml2_1.2.0              
 [17] httpuv_1.4.3             assertthat_0.2.0         viridis_0.5.1            tximport_1.8.0          
 [21] evaluate_0.10.1          promises_1.0.1           BiocInstaller_1.30.0     DEoptimR_1.0-8          
 [25] progress_1.1.2           caTools_1.17.1           dendextend_1.8.0         igraph_1.2.1            
 [29] DBI_1.0.0                htmlwidgets_1.2          sparsesvd_0.1-4          purrr_0.2.4             
 [33] RSpectra_0.13-1          crosstalk_1.0.0          dplyr_0.7.5              backports_1.1.2         
 [37] trimcluster_0.1-2        gridBase_0.4-7           locfdr_1.1-8             ROCR_1.0-7              
 [41] withr_2.1.2              robustbase_0.93-0        checkmate_1.8.5          GenomicAlignments_1.16.0
 [45] prettyunits_1.0.2        mclust_5.4               ape_5.1                  lazyeval_0.2.1          
 [49] edgeR_3.22.2             pkgconfig_2.0.1          slam_0.1-43              nlme_3.1-137            
 [53] vipor_0.4.5              nnet_7.3-12              bindr_0.1.1              rlang_0.2.0             
 [57] diptest_0.75-7           miniUI_0.1.1.1           registry_0.5             cellranger_1.1.0        
 [61] rprojroot_1.3-2          rngtools_1.3.1           Rhdf5lib_1.2.1           base64enc_0.1-3         
 [65] beeswarm_0.2.3           whisker_0.3-2            viridisLite_0.3.0        rjson_0.2.19            
 [69] bitops_1.0-6             shinydashboard_0.7.0     rncl_0.8.2               KernSmooth_2.23-15      
 [73] Biostrings_2.48.0        blob_1.1.1               DelayedMatrixStats_1.2.0 rgl_0.99.16             
 [77] doRNG_1.6.6              manipulateWidget_0.9.0   scales_0.5.0             memoise_1.1.0           
 [81] plyr_1.8.4               howmany_0.3-1            gplots_3.0.1             bibtex_0.4.2            
 [85] gdata_2.18.0             zlibbioc_1.26.0          compiler_3.5.0           HSMMSingleCell_0.114.0  
 [89] bbmle_1.0.20             RColorBrewer_1.1-2       rrcov_1.4-4              Rsamtools_1.32.0        
 [93] ade4_1.7-11              XVector_0.20.0           htmlTable_1.12           MASS_7.3-50             
 [97] mgcv_1.8-23              tidyselect_0.2.4         stringi_1.2.2            densityClust_0.3        
[101] yaml_2.1.19              locfit_1.5-9.1           latticeExtra_0.6-28      ggrepel_0.8.0           
[105] tools_3.5.0              rstudioapi_0.7           uuid_0.1-2               foreach_1.4.4           
[109] foreign_0.8-70           RNeXML_2.1.1             gridExtra_2.3            Rtsne_0.13              
[113] digest_0.6.15            FNN_1.1                  shiny_1.1.0              qlcMatrix_0.9.7         
[117] fpc_2.1-11               bindrcpp_0.2.2           Rcpp_0.12.17             later_0.7.2             
[121] WriteXLS_4.0.0           httr_1.3.1               kernlab_0.9-26           colorspace_1.3-2        
[125] XML_3.98-1.11            clusterExperiment_2.0.2  statmod_1.4.30           flexmix_2.3-14          
[129] xtable_1.8-2             jsonlite_1.5             modeltools_0.2-21        R6_2.2.2                
[133] pillar_1.2.3             htmltools_0.3.6          mime_0.5                 NMF_0.21.0              
[137] glue_1.2.0               class_7.3-14             codetools_0.2-15         pcaPP_1.9-73            
[141] mvtnorm_1.0-7            tibble_1.4.2             ggbeeswarm_0.6.0         gtools_3.5.0            
[145] limma_3.36.1             rmarkdown_1.9            docopt_0.4.5             fastICA_1.2-1           
[149] munsell_0.4.3            e1071_1.6-8              rhdf5_2.24.0             GenomeInfoDbData_1.1.0  
[153] iterators_1.0.9          HDF5Array_1.8.0          reshape2_1.4.3           gtable_0.2.0

DropletUtils • 1.7k views

ADD COMMENT • link updated 6.9 years ago by Aaron Lun ★ 28k • written 6.9 years ago by fabrost • 0

score 2 · Accepted Answer · 2018-05-28

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 11 hours ago

The city by the bay

I daresay that this is due to some unusual symbol on line 13 of genes.tsv; probably a gene name with a quote in it, if I had to guess. Could you confirm this is the case, by just doing something like head -20 genes.tsv and seeing what happens around line 13?

ADD COMMENT • link 6.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Very good hint, thanks for your help! The gene name in line 13 contains a space. Maybe changing read.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE) to read.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE, sep = "\t") would solve this. Now I am thinking of how to work around the issue right now. Should I rather modify the data or read it in a different way?

First 20 lines of genes.tsv:

ENSDARG00000104632    rerg
ENSDARG00000100660    si:ch73-252i11.1
ENSDARG00000098417    syn3
ENSDARG00000100422    ptpro
ENSDARG00000102128    eps8
ENSDARG00000103095    tbk1
ENSDARG00000102226    gpr19
ENSDARG00000104049    crebl2
ENSDARG00000102474    dusp16
ENSDARG00000100143    lrp6
ENSDARG00000104839    mansc1
ENSDARG00000104373    si:zfos-932h1.2
ENSDARG00000098311    si: zfos-932h1.3
ENSDARG00000102121    prr5b
ENSDARG00000102123    phtf2
ENSDARG00000102141    CABZ01102632.1
ENSDARG00000105725    si:cabz01088622.2
ENSDARG00000099787    echdc3
ENSDARG00000070546    msgn1
ENSDARG00000045914    si:ch211-51e12.7

ADD REPLY • link 6.9 years ago fabrost • 0

0

Entering edit mode

After replacing every space in genes.tsv with an underscore, I can read the data just fine.

ADD REPLY • link 6.9 years ago fabrost • 0

0

Entering edit mode

Yes, that's right, or switching to read.delim. I have done this and pushed this to the Github repository; you can either try to install this new version, or wait for it to show up on the BioC build machines in 1-2 days. Or you can just edit genes.tsv to get rid of the space, which probably shouldn't be there in the first place.

ADD REPLY • link 6.9 years ago Aaron Lun ★ 28k