Hi, I'm an undergrad working in a lab and i'm new to bioconductor so please bear with me.
I used kallisto to get transcript level abundances for my data, and am now trying to use tximport to convert it to gene level. i followed the instructions of the vignette, but I get this error message
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both.
Here is what I'm doing.
I used a FASTA file from ensembl, so I used the EnsDb.Hsapiens.v86 package instead of the one in the vignette, which is what other posts on the forum said to do(I couldn't find one for the current ensembl 88 release)
library(EnsDb.Hsapiens.v86) txdb<- EnsDb.Hsapiens.v86 txdb<- EnsDb.Hsapiens.v86 k <- keys(txdb, keytype = "GENEID") df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME") tx2gene <- df[, 2:1]
when I looked at the first couple of entries, the tx2gene data table looked like it was supposed to
head(tx2gene) TXNAME GENEID 1 ENST00000373020 ENSG00000000003 2 ENST00000494424 ENSG00000000003 3 ENST00000496771 ENSG00000000003 4 ENST00000612152 ENSG00000000003 5 ENST00000614008 ENSG00000000003 6 ENST00000373031 ENSG00000000005
when i go to use the tximport function I get the error message.
library(tximport) txi<- tximport("abundance.tsv", type = "kallisto", tx2gene=tx2gene) Note: importing `abundance.h5` is typically faster than `abundance.tsv` reading in files with read_tsv 1 Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both.
I double checked to make sure my file is in the working directory. I'm not to familiar with Bioconductor or R yet so I'm a little stumped. Any help would be appreciated.
Here is my sessioninfo
R version 3.4.0 Patched (2017-05-08 r72665) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8.1 x64 (build 9600) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils [7] datasets methods base other attached packages: [1] tximport_1.4.0 EnsDb.Hsapiens.v86_2.1.0 [3] ensembldb_2.0.1 AnnotationFilter_1.0.0 [5] GenomicFeatures_1.28.0 AnnotationDbi_1.38.0 [7] Biobase_2.36.2 GenomicRanges_1.28.1 [9] GenomeInfoDb_1.12.0 IRanges_2.10.0 [11] S4Vectors_0.14.0 BiocGenerics_0.22.0 [13] BiocInstaller_1.26.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.10 compiler_3.4.0 [3] XVector_0.16.0 AnnotationHub_2.8.1 [5] ProtGenerics_1.8.0 bitops_1.0-6 [7] tools_3.4.0 zlibbioc_1.22.0 [9] biomaRt_2.32.0 digest_0.6.12 [11] rhdf5_2.20.0 tibble_1.3.0 [13] RSQLite_1.1-2 memoise_1.1.0 [15] lattice_0.20-35 Matrix_1.2-10 [17] shiny_1.0.3 DelayedArray_0.2.2 [19] DBI_0.6-1 yaml_2.1.14 [21] GenomeInfoDbData_0.99.0 rtracklayer_1.36.0 [23] httr_1.2.1 hms_0.3 [25] Biostrings_2.44.0 grid_3.4.0 [27] R6_2.2.1 XML_3.98-1.7 [29] BiocParallel_1.10.1 readr_1.1.0 [31] htmltools_0.3.6 Rsamtools_1.28.0 [33] matrixStats_0.52.2 GenomicAlignments_1.12.0 [35] SummarizedExperiment_1.6.1 xtable_1.8-2 [37] mime_0.5 interactiveDisplayBase_1.14.0 [39] httpuv_1.3.3 RCurl_1.95-4.8 [41] lazyeval_0.2.0
ah that did the trick. Thank you so much
On a side note, how do I run multiple files at once instead of one at a time?
What do you mean "run multiple files at a time?" Sorry if I'm being obtuse but I have no idea what you might be referring to. Can you provide more context to the question?
Yes, note that the first argument to tximport is
files
, with an "s" on the end.files
is a character vector of the paths to the multiple files (although of course it can be of length 1 if you only had a single experiment). Please see the examples of tximport in the man pages and vignette, where multiple files are specified.If you're using kallisto to quantify your samples I'd suggest using sleuth instead of tximport -> another DE package, as sleuth is more accurate. See https://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4324.html.