I used salmon 1.0 to count samples against gencode M23 transcripts.
Using tximeta v1.5.2, I get the error below which tells me that my tx IDs in my quant.sf files aren't in the gtf.
I believe the problem might be that the names in the gencode transcript FASTA (which match those in the quant.sf) file are like:
ENSMUST00000193812.1|ENSMUSG00000102693.1|OTTMUSG00000049935.1|OTTMUST00000127109.1|4933401J01Rik-201|4933401J01Rik|1070|TEC|
but the corresponding line in the gtf is like:
$ grep ENSMUST00000193812.1 gencode.vM23.annotation.gtf chr1 HAVANA transcript 3073253 3074322 . + . geneid "ENSMUSG00000102693.1"; transcriptid "ENSMUST00000193812.1"; genetype "TEC"; genename "4933401J01Rik"; transcripttype "TEC"; transcriptname "4933401J01Rik-201"; level 2; transcriptsupportlevel "NA"; mgiid "MGI:1918292"; tag "basic"; havanagene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
Thanks for any advice!
- John Tobias
Error text: ```
se <- tximeta(coldata) importing quantifications reading in files with readtsv 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 found matching transcriptome: [ GENCODE - Mus musculus - release M23 ] loading existing TxDb created: 2019-11-25 21:57:58 Loading required package: GenomicFeatures loading existing transcript ranges created: 2019-11-25 21:58:44 Error in checkAssays2Txps(assays, txps) : none of the transcripts in the quantification files are in the GTF In addition: Warning message: In class(object) <- "environment" : Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object sessionInfo() R version 3.6.1 (2019-07-05) Platform: x8664-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.1
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale: [1] enUS.UTF-8/enUS.UTF-8/enUS.UTF-8/C/enUS.UTF-8/en_US.UTF-8
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base
other attached packages:
[1] GenomicFeatures1.36.4 org.Mm.eg.db3.8.2
[3] AnnotationDbi1.46.1 openxlsx4.1.3
[5] webshot0.5.2 BiocStyle2.12.0
[7] biomaRt2.40.5 pcaExplorer2.10.1
[9] tximportData1.12.0 tximport1.12.3
[11] tximeta1.5.2 DESeq21.24.0
[13] SummarizedExperiment1.14.1 DelayedArray0.10.0
[15] BiocParallel1.18.1 matrixStats0.55.0
[17] Biobase2.44.0 GenomicRanges1.36.1
[19] GenomeInfoDb1.20.0 IRanges2.18.3
[21] S4Vectors0.22.1 BiocGenerics0.30.0
[23] readr_1.3.1
loaded via a namespace (and not attached):
[1] GOstats2.50.0 backports1.1.5 Hmisc4.3-0
[4] BiocFileCache1.8.0 NMF0.21.0 plyr1.8.4
[7] igraph1.2.4.1 lazyeval0.2.2 GSEABase1.46.0
[10] shinydashboard0.7.1 splines3.6.1 crosstalk1.0.0
[13] ggplot23.2.1 gridBase0.4-7 digest0.6.23
[16] foreach1.4.7 ensembldb2.8.1 htmltools0.4.0
[19] GO.db3.8.2 magrittr1.5 checkmate1.9.4
[22] memoise1.1.0 cluster2.1.0 doParallel1.0.15
[25] limma3.40.6 Biostrings2.52.0 annotate1.62.0
[28] prettyunits1.0.2 colorspace1.4-1 blob1.2.0
[31] rappdirs0.3.1 ggrepel0.8.1 xfun0.11
[34] dplyr0.8.3 crayon1.3.4 RCurl1.95-4.12
[37] jsonlite1.6 graph1.62.0 genefilter1.66.0
[40] zeallot0.1.0 survival3.1-7 iterators1.0.12
[43] glue1.3.1 registry0.5-1 gtable0.3.0
[46] zlibbioc1.30.0 XVector0.24.0 Rgraphviz2.28.0
[49] SparseM1.77 scales1.1.0 pheatmap1.0.12
[52] DBI1.0.0 rngtools1.4 bibtex0.4.2
[55] Rcpp1.0.3 xtable1.8-4 progress1.2.2
[58] htmlTable1.13.2 foreign0.8-72 bit1.1-14
[61] Formula1.2-3 DT0.10 AnnotationForge1.26.0
[64] htmlwidgets1.5.1 httr1.4.1 threejs0.3.1
[67] RColorBrewer1.1-2 shinyAce0.4.1 acepack1.4.1
[70] pkgconfig2.0.3 XML3.98-1.20 nnet7.3-12
[73] dbplyr1.4.2 locfit1.5-9.1 tidyselect0.2.5
[76] rlang0.4.2 reshape21.4.3 later1.0.0
[79] munsell0.5.0 tools3.6.1 RSQLite2.1.2
[82] shinyBS0.61 evaluate0.14 stringr1.4.0
[85] fastmap1.0.1 yaml2.2.0 knitr1.26
[88] bit640.9-7 zip2.0.4 purrr0.3.3
[91] AnnotationFilter1.8.0 RBGL1.60.0 mime0.7
[94] compiler3.6.1 rstudioapi0.10 curl4.2
[97] png0.1-7 tibble2.1.3 geneplotter1.62.0
[100] stringi1.4.3 lattice0.20-38 ProtGenerics1.16.0
[103] Matrix1.2-17 vctrs0.2.0 pillar1.4.2
[106] lifecycle0.1.0 BiocManager1.30.10 d3heatmap0.6.1.2
[109] data.table1.12.6 bitops1.0-6 httpuv1.5.2
[112] rtracklayer1.44.4 R62.4.1 latticeExtra0.6-28
[115] promises1.1.0 topGO2.36.0 gridExtra2.3
[118] codetools0.2-16 assertthat0.2.1 Category2.50.0
[121] pkgmaker0.27 withr2.1.2 GenomicAlignments1.20.1
[124] Rsamtools2.0.3 GenomeInfoDbData1.2.1 hms0.5.2
[127] grid3.6.1 rpart4.1-15 tidyr1.0.0
[130] rmarkdown1.17 shiny1.4.0 base64enc0.1-3
Thanks for the quick reply!
So I tried this with the same error:
se <- tximeta(coldata, ignoreAfterBar = TRUE)
Should I be doing something else to pass this argument to tximport?
Could it be that I need to update tximport to the devel version as well?
I take it back. It's tximeta that needs to ignoreAfterBar, and I don't have code for this yet. I can try to fix this tomorrow for devel branch.
In the mean time, the other solution would be to index with
--gencode
which is recommended for Gencodetranscripts.fa
files (note that it also makes the quant files smaller because the transcript IDs lose a lot of the unnecessary characters).John,
Thanks for the bug report. I just pushed version 1.5.3 to Bioconductor and GitHub. Would you mind checking if it solves on your end? Just
tximeta(coldata)
, no other arguments needed