tximport errors with h5 and tsv files
1
0
Entering edit mode
sp21 • 0
@sp21-22495
Last seen 5.0 years ago

I am trying to use tximport to make data matrices from kallisto output files. I have tried using both the h5 and tsv output files, and both are producing errors. When I try to use the tsv files, I am running the following:

`dir <- "/Users/My_Name/Downloads"
sampleruns <- c("SRR3402457_1.tsv", "SRR3402460_1.tsv", "SRR3402456_1.tsv", "SRR3402459_1.tsv")
files <- file.path(dir, sampleruns)
k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE, 
                             ignoreTxVersion = TRUE)`

This gives me the following error:

Error in .local(object, ...) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Example IDs (file): [TR100009|c0_g1_i1|m, TR100024|c0_g1_i1|m, TR100032|c0_g1_i1|m, ...]

Example IDs (tx2gene): [ORF TR100009|c0_g1_i1|g.500685 TR100009|c0_g1_i1|m.500685 type:internal len:127 (-), ORF TR100024|c0_g1_i1|g.500687 TR100024|c0_g1_i1|m.500687 type:complete len:111 (+), ORF TR100032|c0_g1_i1|g.500688 TR100032|c0_g1_i1|m.500688 type:complete len:120 (-), ...]

  This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

For reference, my tx2gene file looks like this:

TXNAME
1       TR100009|c0_g1_i1|g.500685 TR100009|c0_g1_i1|m.500685 type:internal len:127 (-)
2       TR100024|c0_g1_i1|g.500687 TR100024|c0_g1_i1|m.500687 type:complete len:111 (+)
3       TR100032|c0_g1_i1|g.500688 TR100032|c0_g1_i1|m.500688 type:complete len:120 (-)
4 TR100037|c0_g1_i1|g.500691 TR100037|c0_g1_i1|m.500691 type:3prime_partial len:101 (-)
5           TR10004|c1_g1_i1|g.85724 TR10004|c1_g1_i1|m.85724 type:internal len:189 (+)
6       TR100051|c0_g1_i1|g.500696 TR100051|c0_g1_i1|m.500696 type:internal len:147 (-)
                                                                                 GENEID
1       TR100009|c0_g1_i1|g.500685 TR100009|c0_g1_i1|m.500685 type:internal len:127 (-)
2       TR100024|c0_g1_i1|g.500687 TR100024|c0_g1_i1|m.500687 type:complete len:111 (+)
3       TR100032|c0_g1_i1|g.500688 TR100032|c0_g1_i1|m.500688 type:complete len:120 (-)
4 TR100037|c0_g1_i1|g.500691 TR100037|c0_g1_i1|m.500691 type:3prime_partial len:101 (-)
5           TR10004|c1_g1_i1|g.85724 TR10004|c1_g1_i1|m.85724 type:internal len:189 (+)
6       TR100051|c0_g1_i1|g.500696 TR100051|c0_g1_i1|m.500696 type:internal len:147 (-)

When I instead try to use h5 files, I am running the following:

dir <- "/Users/My_Name/Downloads"
sampleruns <- c("SRR3402457_1.h5", "SRR3402460_1.h5", "SRR3402456_1.h5", "SRR3402459_1.h5")
files <- file.path(dir, sampleruns)
names(files) <- paste0("sample", 1:4)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE, tx2gene = tx2gene)

I have tried the above with and without the tx2gene argument. Either way, I am getting the following error:

Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read.delim (install 'readr' package for speed up)
1 Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<89>HDF'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 3 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 4 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 5 appears to contain embedded nulls

Lastly, the here is the output of sessionInfo():

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximportData_1.12.0    GenomicFeatures_1.36.4 AnnotationDbi_1.46.1   Biobase_2.44.0         GenomicRanges_1.36.1  
 [6] GenomeInfoDb_1.20.0    IRanges_2.18.3         S4Vectors_0.22.1       BiocGenerics_0.30.0    rhdf5_2.28.1          
[11] tximport_1.12.3        edgeR_3.26.8           limma_3.40.6          

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3                  compiler_3.6.1              pillar_1.4.2                BiocManager_1.30.10        
 [5] XVector_0.24.0              prettyunits_1.0.2           progress_1.2.2              bitops_1.0-6               
 [9] tools_3.6.1                 zlibbioc_1.30.0             biomaRt_2.40.5              zeallot_0.1.0              
[13] digest_0.6.23               bit_1.1-14                  RSQLite_2.1.3               memoise_1.1.0              
[17] tibble_2.1.3                lattice_0.20-38             pkgconfig_2.0.3             rlang_0.4.2                
[21] Matrix_1.2-18               DelayedArray_0.10.0         DBI_1.0.0                   yaml_2.2.0                 
[25] GenomeInfoDbData_1.2.1      rtracklayer_1.44.4          httr_1.4.1                  stringr_1.4.0              
[29] hms_0.5.2                   Biostrings_2.52.0           vctrs_0.2.0                 locfit_1.5-9.1             
[33] bit64_0.9-7                 grid_3.6.1                  R6_2.4.1                    BiocParallel_1.18.1        
[37] XML_3.98-1.20               magrittr_1.5                Rhdf5lib_1.6.3              blob_1.2.0                 
[41] matrixStats_0.55.0          GenomicAlignments_1.20.1    Rsamtools_2.0.3             backports_1.1.5            
[45] SummarizedExperiment_1.14.1 assertthat_0.2.1            stringi_1.4.3               RCurl_1.95-4.12            
[49] crayon_1.3.4

Any help or insight into how I might solve EITHER of these errors (I just need one filetype to work) would be greatly appreciated. Thank you!

tximport kallisto • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

tximport can only do so much, and here look at the difference that tximport is telling you about, between the IDs in the quantification file and in the tx2gene table:

TR100009|c0g1i1|m ORF TR100009|c0g1i1|g.500685 TR100009|c0g1i1|m.500685 type:internal len:127 (-)

Besides what comes after a | or a ., do you see any other difference? You have to do a little legwork yourself to provide the right input so that software can do the right thing.

ADD COMMENT

Login before adding your answer.

Traffic: 651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6