Entering edit mode
Reading in data generated with dexseq_count.py
using DEXSeqDataSetFromHTSeq()
throws the following error:
Error in FUN(X[[i]], ...) : subscript out of bounds
4. lapply(X = X, FUN = FUN, ...)
3. sapply(splitted, "[[", 2)
2. sapply(splitted, "[[", 2)
1. DEXSeqDataSetFromHTSeq(countFiles, sampleData = sampleTable,
design = ~sample + exon + condition:exon, flattenedfile = flattenedFile)
Based on the traceback it leads back to the line where it reads the count files inside the function -
lf <- lapply(countfiles, function(x) read.table(x, header = FALSE,
stringsAsFactors = FALSE))
For some reason the default behaviour of read.table()
splits the file into 3 columns rather than 2 as originally intended. Specifying the delimiter for the file explicitly solves the issue. i.e.
lf <- lapply(countfiles, function(x) read.table(x, header = FALSE,
stringsAsFactors = FALSE, sep = '\t'))
I'm not sure where to post this fix or if this is the best way to solve this issue. Maybe someone with more experience could provide a better solution.
dxd = DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile )
$ Error in FUN(X[[i]], ...) : subscript out of bounds
traceback()
$ 4: lapply(X = X, FUN = FUN, ...)
$ 3: sapply(splitted, "[[", 2)
$ 2: sapply(splitted, "[[", 2)
$ 1: DEXSeqDataSetFromHTSeq(countFiles, sampleData = sampleTable,
design = ~sample + exon + condition:exon, flattenedfile = flattenedFile)
sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8 LC_MONETARY=English_India.utf8 LC_NUMERIC=C LC_TIME=English_India.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DEXSeq_1.42.0 RColorBrewer_1.1-3 AnnotationDbi_1.58.0 DESeq2_1.36.0 SummarizedExperiment_1.26.1
[6] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 IRanges_2.30.0 S4Vectors_0.34.0 MatrixGenerics_1.8.1
[11] matrixStats_0.62.0 Biobase_2.56.0 BiocGenerics_0.42.0 BiocParallel_1.30.3 reshape2_1.4.4
[16] kableExtra_1.3.4 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
[21] readr_2.1.2 tidyr_1.2.0 tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2
[26] data.table_1.14.2
loaded via a namespace (and not attached):
[1] googledrive_2.0.0 colorspace_2.0-3 hwriter_1.3.2.1 ellipsis_0.3.2 XVector_0.36.0 fs_1.5.2
[7] rstudioapi_0.13 bit64_4.0.5 fansi_1.0.3 lubridate_1.8.0 xml2_1.3.3 codetools_0.2-18
[13] splines_4.2.1 cachem_1.0.6 geneplotter_1.74.0 knitr_1.39 jsonlite_1.8.0 Rsamtools_2.12.0
[19] broom_1.0.0 annotate_1.74.0 dbplyr_2.2.1 png_0.1-7 compiler_4.2.1 httr_1.4.3
[25] backports_1.4.1 assertthat_0.2.1 Matrix_1.4-1 fastmap_1.1.0 gargle_1.2.0 cli_3.3.0
[31] prettyunits_1.1.1 htmltools_0.5.3 tools_4.2.1 gtable_0.3.0 glue_1.6.2 GenomeInfoDbData_1.2.8
[37] rappdirs_0.3.3 Rcpp_1.0.9 cellranger_1.1.0 vctrs_0.4.1 Biostrings_2.64.0 svglite_2.1.0
[43] xfun_0.32 rvest_1.0.2 lifecycle_1.0.1 pacman_0.5.1 statmod_1.4.36 XML_3.99-0.10
[49] googlesheets4_1.0.0 zlibbioc_1.42.0 scales_1.2.0 hms_1.1.1 parallel_4.2.1 curl_4.3.2
[55] memoise_2.0.1 biomaRt_2.52.0 stringi_1.7.8 RSQLite_2.2.15 genefilter_1.78.0 filelock_1.0.2
[61] rlang_1.0.4 pkgconfig_2.0.3 systemfonts_1.0.4 bitops_1.0-7 evaluate_0.16 lattice_0.20-45
[67] bit_4.0.4 tidyselect_1.1.2 plyr_1.8.7 magrittr_2.0.3 R6_2.5.1 generics_0.1.3
[73] DelayedArray_0.22.0 DBI_1.1.3 pillar_1.8.0 haven_2.5.0 withr_2.5.0 survival_3.4-0
[79] KEGGREST_1.36.3 RCurl_1.98-1.8 modelr_0.1.8 crayon_1.5.1 utf8_1.2.2 BiocFileCache_2.4.0
[85] tzdb_0.3.0 rmarkdown_2.14 progress_1.2.2 locfit_1.5-9.6 grid_4.2.1 readxl_1.4.0
[91] blob_1.2.3 reprex_2.0.1 digest_0.6.29 webshot_0.5.3 xtable_1.8-4 munsell_0.5.0
[97] viridisLite_0.4.0
Thanks for your detail post. I think this might be related to this issue here: DEXSeq errors "Error in scan( line ... did not have 3 elements" and "Error in FUN(X[[i]], ...) : subscript out of bounds" (DEXSeqDataSetFromHTSeq)
Just to verify, could you please post the first lines of your count files?
Yes, indeed it is the same error. Here are the first few lines of my count file.
As already mentioned by Arthur in that thread, I also thought of modifying the count files to remove the quotes around the
colon
, and that also solves the issue. I tried -