Hello, I am trying to quantify human transcripts mapped with salmon in some RNA sequencing (GSE107467) using tximeta.
This is the code I use to import the annotation file and the transcript file.
dir2 <- system.file("extdata", package="tximeta")
indexDir <- file.path(dir2, "gencode_v29_idx")
fastaFTP <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz"
gtfPath <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.primary_assembly.annotation.gtf.gz"
suppressPackageStartupMessages(library(tximeta))
makeLinkedTxome(indexDir=indexDir,
source="GENCODE",
organism="Homo sapiens",
release="29",
genome="GRCh38",
fasta=fastaFTP,
gtf=gtfPath,
write=FALSE)
and this is the code to import the salmon quantification files in tximeta
dir = '../nowakowski'
setwd(dir)
list.files()
files <-list.files(path='data')
files2 <- file.path(dir, "data", files, "quant.sf")
cond = c(rep('E19',5),rep('E15',4))
list = as.data.frame(cbind(files,files2,cond), stringsAsfactors=FALSE)
colnames(list)=c('names','files','condition')
list[,1] <- gsub('_1.sf', '', list[,1])
file.exists(as.vector(list[,2]))
se <- tximeta(list)
which outputs:
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9
found matching linked transcriptome:
[ Gencode - Homo sapiens - release 29 ]
loading existing TxDb created: 2019-12-09 21:58:05
Loading required package: GenomicFeatures
generating transcript ranges
fetching genome info
Error in FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs, :
cannot map the following UCSC seqlevel(s) to an NCBI seqlevel: chr8_KZ208915v1_fix, chr15_KN538374v1_fix,
chr15_KQ031389v1_alt, chr16_KV880768v1_fix, chr12_KZ208916v1_fix, chr14_KZ208920v1_fix, chr7_KZ208913v1_alt,
chr5_KV575244v1_fix, chr7_KZ208912v1_fix, chr1_KV880763v1_alt, chr12_KN538369v1_fix, chr2_KQ983256v1_alt,
chr2_KQ031384v1_fix, chr16_KZ559113v1_fix, chr7_KV880765v1_fix, chr1_KQ031383v1_fix, chr1_KN538360v1_fix,
chr3_KN196475v1_fix, chr4_KV766193v1_alt, chr10_KN538367v1_fix, chr3_KN538364v1_fix, chr3_KV766192v1_fix,
chr18_KQ090028v1_fix, chr19_KQ458386v1_fix, chr3_KQ031385v1_fix, chr19_KN196484v1_fix, chr2_KN538363v1_fix,
chr5_KV575243v1_alt, chr13_KN538372v1_fix, chr1_KQ458383v1_alt, chr9_KN196479v1_fix, chr1_KZ208906v1_fix,
chr6_KQ031387v1_fix, chr12_KQ759760v1_fix, chr3_KN196476v1_fix, chr1_KN538361v1_fix, chr11_KZ559108v1_fix,
chr3_KZ559103v1_alt, chr11_KZ559110v1_alt, chr19_KV575249v1_alt, chr17_KV766196v1_fix, chr11_KZ559109v1_fix,
chr1_KQ98
I am using a salmon index obtained from the same transcript file to map the reads... I do not understand. The same code works for mouse RNAseq. This works:
dir2 <- system.file("extdata", package="tximeta")
indexDir <- file.path(dir2, "gencode_vM23_idx")
fastaFTP <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.transcripts.fa.gz"
gtfPath <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.primary_assembly.annotation.gtf.gz"
suppressPackageStartupMessages(library(tximeta))
makeLinkedTxome(indexDir=indexDir,
source="GENCODE",
organism="Mus musculus",
release="M23",
genome="GRCm38",
fasta=fastaFTP,
gtf=gtfPath,
write=FALSE)
se <- tximeta(list)
There is a problem with the patched human genome fetched by tximeta: https://support.bioconductor.org/p/117808/
Sequences like chr15KN538374v1fix don't belong to GRCh38 but to recent patched versions of GRCh38 (e.g. GRCh38.p12, which is the latest). Unfortunately it looks like the UCSC folks updated the chromInfo table for hg38 a couple of days ago, and that the table now contains chrom info for GRCh38.p12 (or maybe GRCh38.p11). You can see a dump of this SQL table here:
But i do not know how to solve it. Can anyone help me please?
Thanks, Marco
ps: I add sessionInfo
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /pstore/apps/OpenBLAS/0.2.13-GCC-4.8.4-LAPACK-3.5.0/lib/libopenblas_prescottp-r0.2.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.34.8 apeglm_1.4.2 PoiClaClu_1.0.2.1 pheatmap_1.0.12
[5] ggplot2_3.2.1 dplyr_0.8.3 DESeq2_1.20.0 SummarizedExperiment_1.10.1
[9] DelayedArray_0.6.4 BiocParallel_1.14.2 matrixStats_0.54.0 GenomicRanges_1.32.6
[13] GenomeInfoDb_1.16.0 tximeta_1.0.3 STRINGdb_1.22.0 igraph_1.2.4.2
[17] xlsx_0.6.1 RColorBrewer_1.1-2 org.Hs.eg.db_3.7.0 org.Mm.eg.db_3.7.0
[21] AnnotationDbi_1.44.0 IRanges_2.14.10 S4Vectors_0.18.3 Biobase_2.40.0
[25] BiocGenerics_0.26.0 gplots_3.0.1 Glimma_1.10.1 edgeR_3.22.3
[29] limma_3.36.2
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 htmlTable_1.12 XVector_0.20.0 base64enc_0.1-3
[5] rstudioapi_0.10 hash_2.2.6.1 bit64_0.9-7 sqldf_0.4-11
[9] splines_3.5.1 tximport_1.10.1 geneplotter_1.58.0 knitr_1.20
[13] zeallot_0.1.0 Formula_1.2-3 jsonlite_1.6 Rsamtools_1.32.2
[17] annotate_1.58.0 rJava_0.9-10 cluster_2.0.7-1 dbplyr_1.4.2
[21] png_0.1-7 readr_1.3.1 compiler_3.5.1 httr_1.4.1
[25] backports_1.1.2 assertthat_0.2.0 Matrix_1.3-0 lazyeval_0.2.1
[29] acepack_1.4.1 htmltools_0.3.6 prettyunits_1.0.2 tools_3.5.1
[33] coda_0.19-1 gtable_0.2.0 glue_1.3.0 GenomeInfoDbData_1.1.0
[37] rappdirs_0.3.1 Rcpp_1.0.3 bbmle_1.0.20 vctrs_0.2.0
[41] Biostrings_2.48.0 gdata_2.18.0 rtracklayer_1.40.3 stringr_1.4.0
[45] xlsxjars_0.6.1 proto_1.0.0 ensembldb_2.6.8 gtools_3.8.1
[49] XML_3.98-1.13 MASS_7.3-50 zlibbioc_1.26.0 scales_0.5.0
[53] BiocInstaller_1.30.0 hms_0.5.2 ProtGenerics_1.14.0 AnnotationFilter_1.6.0
[57] yaml_2.2.0 curl_3.2 memoise_1.1.0 gridExtra_2.3
[61] emdbook_1.3.11 biomaRt_2.36.1 rpart_4.1-13 latticeExtra_0.6-28
[65] stringi_1.2.4 RSQLite_2.1.1 genefilter_1.62.0 plotrix_3.7-7
[69] checkmate_1.8.5 caTools_1.17.1.1 chron_2.3-52 rlang_0.4.2
[73] pkgconfig_2.0.1 bitops_1.0-6 lattice_0.20-35 purrr_0.3.3
[77] GenomicAlignments_1.16.0 htmlwidgets_1.5.1 bit_1.1-14 tidyselect_0.2.5
[81] plyr_1.8.4 magrittr_1.5 R6_2.2.2 Hmisc_4.1-1
[85] DBI_1.0.0 withr_2.1.2 gsubfn_0.7 pillar_1.4.2
[89] foreign_0.8-71 survival_2.42-6 RCurl_1.95-4.11 nnet_7.3-12
[93] tibble_2.1.3 crayon_1.3.4 KernSmooth_2.23-15 BiocFileCache_1.6.0
[97] progress_1.2.0 locfit_1.5-9.1 grid_3.5.1 data.table_1.11.4
[101] blob_1.1.1 digest_0.6.15 xtable_1.8-2 numDeriv_2016.8-1
[105] munsell_0.5.0
Thanks Michael, you are right. I tried to update all packages but GenomeInfoDb gave an error and I did not noticed. I had to update it manually downloading the package. Thanks again now it works like a charm.