This is my first time analyzing RNA sequencing data for gene expression. I am trying to import count data from Kallisto to DESeq2 using the tximport package following the instructions here. After running this code:
filenames <- list.files("./Data", full.names = TRUE, pattern = "*abundance.h5")
files <- filenames %>% `names<-`(str_extract(filenames, "SWS[:digit:]*"))
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
I get the following error:
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 Warning: 4894 parsing failures.
row col expected actual file
2 <U+0089>HDF embedded null './Data/SWS01_abundance.h5'
2 NA 1 columns 2 columns './Data/SWS01_abundance.h5'
5 <U+0089>HDF embedded null './Data/SWS01_abundance.h5'
9 <U+0089>HDF embedded null './Data/SWS01_abundance.h5'
10 <U+0089>HDF embedded null './Data/SWS01_abundance.h5'
... ........... ......... ............. ...........................
See problems(...) for more details.
Error in tximport(files, type = "kallisto", tx2gene = tx2gene, txOut = TRUE) :
all(c(lengthCol, abundanceCol) %in% names(raw)) is not TRUE
In addition: Warning message:
Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
I'm trying to import the .h5 files, but when I peak in the .tsv files, they are formatted like this:
# A tibble: 105,129 x 5
target_id length eff_length est_counts tpm
<chr> <dbl> <dbl> <dbl> <dbl>
1 ENSMUST00000177564.1-Trdd2 16 17 0 0
2 ENSMUST00000196221.1-Trdd1 9 10 0 0
3 ENSMUST00000179664.1-Trdd1 11 12 0 0
4 ENSMUST00000178537.1-Trbd1 12 13 0 0
5 ENSMUST00000178862.1-Trbd2 14 15 0 0
6 ENSMUST00000179520.1-Ighd4-1 11 12 0 0
7 ENSMUST00000179883.1-Ighd3-2 16 17 0 0
8 ENSMUST00000195858.1-Ighd5-6 10 11 0 0
9 ENSMUST00000179932.1-Ighd5-6 12 13 0 0
10 ENSMUST00000180001.1-Ighd2-8 17 18 0 0
# ... with 105,119 more rows
Here's my session info:
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.22.2 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.16.5 matrixStats_0.54.0 tximport_1.10.1
[7] rhdf5_2.26.2 GenomicFeatures_1.34.1 AnnotationDbi_1.44.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
[13] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
[19] purrr_0.2.5 readr_1.3.1 tidyr_0.8.2 tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 htmlTable_1.13.1 XVector_0.22.0 base64enc_0.1-3 rstudioapi_0.9.0 bit64_0.9-7 fansi_0.4.0
[8] lubridate_1.7.4 xml2_1.2.0 splines_3.5.2 geneplotter_1.60.0 knitr_1.21 Formula_1.2-3 jsonlite_1.6
[15] Rsamtools_1.34.0 broom_0.5.1 annotate_1.60.0 cluster_2.0.7-1 compiler_3.5.2 httr_1.4.0 backports_1.1.3
[22] assertthat_0.2.0 Matrix_1.2-15 lazyeval_0.2.1 cli_1.0.1 acepack_1.4.1 htmltools_0.3.6 prettyunits_1.0.2
[29] tools_3.5.2 bindrcpp_0.2.2 gtable_0.2.0 glue_1.3.0 GenomeInfoDbData_1.2.0 Rcpp_1.0.0 cellranger_1.1.0
[36] Biostrings_2.50.2 nlme_3.1-137 rtracklayer_1.42.1 xfun_0.4 rvest_0.3.2 XML_3.98-1.16 zlibbioc_1.28.0
[43] scales_1.0.0 hms_0.4.2 RColorBrewer_1.1-2 yaml_2.2.0 memoise_1.1.0 gridExtra_2.3 biomaRt_2.38.0
[50] rpart_4.1-13 latticeExtra_0.6-28 stringi_1.2.4 RSQLite_2.1.1 genefilter_1.64.0 checkmate_1.8.5 rlang_0.3.1
[57] pkgconfig_2.0.2 bitops_1.0-6 lattice_0.20-38 Rhdf5lib_1.4.2 bindr_0.1.1 GenomicAlignments_1.18.1 htmlwidgets_1.3
[64] bit_1.1-14 tidyselect_0.2.5 plyr_1.8.4 magrittr_1.5 R6_2.3.0 generics_0.0.2 Hmisc_4.1-1
[71] DBI_1.0.0 pillar_1.3.1 haven_2.0.0 foreign_0.8-71 withr_2.1.2 survival_2.43-3 RCurl_1.95-4.11
[78] nnet_7.3-12 modelr_0.1.2 crayon_1.3.4 utf8_1.1.4 progress_1.2.0 locfit_1.5-9.1 grid_3.5.2
[85] readxl_1.2.0 data.table_1.11.8 blob_1.1.1 digest_0.6.18 xtable_1.8-3 munsell_0.5.0
Any ideas to help solve my import problem?
Thanks for your help!
When I run the commands using the abundance.tsv files:
It actually imports all my files, but renders the subsequent error:
Thanks again!
Consider taking the advice that is printed in the error message.