I downloaded the RSEM-generated gene counts files from ENCODE, and am hoping to produce a normalized matrix. Unfortunately, tximport
doesn't seem to be able to read the RSEM files from ENCODE, which include additional output columns.
ENCODE's RSEM-generated gene count file headers and example row:
gene_id transcript_id(s) length effective_length expected_count TPM FPKM posterior_mean_count posterior_standard_deviation_of_count pme_TPM pme_FPKM TPM_ci_lower_bound TPM_ci_upper_bound TPM_coefficient_of_quartile_variation FPKM_ci_lower_bound FPKM_ci_upper_bound FPKM_coefficient_of_quartile_variation
ENSG00000000003.14 ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4,ENST00000614008.4 1745.64 1646.64 8.00 0.12 0.15 8.00 0.00 0.24 0.30 0.0992221 0.38994 0.218542 0.126724 0.498276 0.218431
error message:
reading in files with read_tsv
1
Warning message:
“Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.”
Warning message:
“59526 parsing failures.
row col expected actual file
1 -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
2 -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
3 -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
4 -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
5 -- 7 columns 17 columns '/Users/alex/Documents/AChroMap/data/raw/ENCODE/rna/downloads/ENCFF488ZHV.tsv'
... ... ......... .......... ..............................................................................
See problems(...) for more details.
”
The command was:
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
I'm new to R (from python) and so would benefit most from a detailed answer. Many thanks,
Environment:
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin20.4.0 (64-bit)
Running under: macOS Big Sur 11.3
Matrix products: default
BLAS: /usr/local/Cellar/openblas/0.3.15_1/lib/libopenblasp-r0.3.15.dylib
LAPACK: /usr/local/Cellar/r/4.1.0/lib/R/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.0 tximportData_1.20.0 readr_1.4.0
[4] tximport_1.20.0
loaded via a namespace (and not attached):
[1] magrittr_2.0.1 hms_1.1.0 uuid_0.1-4 R6_2.5.0
[5] rlang_0.4.11 fansi_0.5.0 tools_4.1.0 utf8_1.2.1
[9] htmltools_0.5.1.1 ellipsis_0.3.2 digest_0.6.27 tibble_3.1.2
[13] lifecycle_1.0.0 crayon_1.4.1 IRdisplay_1.0 repr_1.1.3
[17] base64enc_0.1-3 vctrs_0.3.8 IRkernel_1.2 evaluate_0.14
[21] pbdZMQ_0.3-5 compiler_4.1.0 pillar_1.6.1 jsonlite_1.7.2
[25] pkgconfig_2.0.3
I've updated the post with the error message. I _have_ tried chopping off the latter columns, and it appears to work.