Hello,
I am trying to run swish on a non-reference transcript assembly.
I've tried:
- Importing the data via tximeta, although I am having trouble getting tximeta to recognise my custom transcriptome (which consists of referenced transcripts from Ensembl.GRCh38.v85 as well as novel transcripts assembled from StringTie). When using a LinkedTxome as follows:
#make the linkedTxome
indexDir = file.path("salmon_index/agg-agg-agg.gtf.salmon.index")
gtfPath = file.path("export/agg-agg-agg.gtf") ##<<--- custom transcriptome with REF+StringTie txs... getting errors
#gtfPath = "http://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz" ##<<--- this one works but then removes all the novel transcripts from StringTie when tximeta is called
fastaPath = file.path("salmon_index/agg-agg-aggtranscripts.fasta")
tmp = tempdir()
jsonFile = file.path(tmp, paste0(basename(indexDir), ".json"))
library(ensembldb)
makeLinkedTxome(indexDir = indexDir,
source = "Ensembl", organism = "Homo sapien",
release = "85", genome = "GRCh38",
fasta=fastaPath, gtf=gtfPath,
jsonFile = jsonFile)
se = tximeta(sample_table, tx2gene="expression.dir/csvdb_files/tx2gene.txt")
I get the following errors:
importing quantifications
reading in files with read_tsv
1 2 3 4
found matching linked transcriptome:
[ Ensembl - Homo sapien - release 85 ]
useHub=TRUE: checking for EnsDb via 'AnnotationHub'
snapshotDate(): 2020-10-27
did not find matching EnsDb via 'AnnotationHub'
building EnsDb with 'ensembldb' package
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
Attribute availability:
o gene_id ... OK
o gene_name ... OK
o entrezid ... Nope
o gene_biotype ... Nope
OK
Processing transcripts ...
Attribute availability:
o transcript_id ... OK
o gene_id ... OK
o source ... OK
I can't find type=='CDS'! The resulting database will lack CDS information!
Error in FUN(X[[i]], ...) :
Column 'tx_cds_seq_start' is not of type integer!
#subsequent runs give this error
loading existing EnsDb created: 2021-04-27 09:55:06
Error in EnsDb(loadpath) :
Required tables tx, tx2exon, exon, chromosome are not present in the database!
I am primarily looking at UTRs so am not in need of the CDS information. is this an issue with how I have set up the linkedtxome? or is there another way to get tximeta to work without the cds information?
- I also wondered whether I may be able to do this via Tximport, however, the inferential replicates from Salmon seem to come out different, i.e. as "infReps" as opposed to "infRep1, infRep2, infRepN" (as with tximeta):
library(tximport)
files = sample_table$files
names(files) = sample_table$names
se = tximport(files, type="salmon", txOut=TRUE)
se = scaleInfReps(se)
Gives the error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'metadata' for signature '"list"'
Is there a way to link the metadata from sample_table back to the se object? (e.g. similar to that of a DESeqDataSetFromTximport, does something similar to this exist for swish?)
Any help or advice to try to get this working would be greatly appreciated.
Many thanks in advance.
Jack.
sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximport_1.18.0 org.Hs.eg.db_3.12.0 ensembldb_2.14.1 AnnotationFilter_1.14.0
[5] GenomicFeatures_1.42.3 AnnotationDbi_1.52.0 SummarizedExperiment_1.20.0 Biobase_2.50.0
[9] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[13] BiocGenerics_0.36.0 MatrixGenerics_1.2.1 matrixStats_0.58.0 qvalue_2.22.0
[17] samr_3.0 dplyr_1.0.5 tximeta_1.8.5 fishpond_1.6.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-0 ellipsis_0.3.1 XVector_0.30.0 fs_1.5.0
[5] bit64_4.0.5 interactiveDisplayBase_1.28.0 fansi_0.4.2 xml2_1.3.2
[9] splines_4.0.4 cachem_1.0.4 impute_1.64.0 knitr_1.33
[13] jsonlite_1.7.2 Rsamtools_2.6.0 dbplyr_2.1.1 shiny_1.6.0
[17] BiocManager_1.30.12 readr_1.4.0 compiler_4.0.4 httr_1.4.2
[21] assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0 lazyeval_0.2.2
[25] later_1.1.0.1 htmltools_0.5.1.1 prettyunits_1.1.1 tools_4.0.4
[29] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4 reshape2_1.4.4
[33] rappdirs_0.3.3 tinytex_0.31 Rcpp_1.0.6 vctrs_0.3.6
[37] Biostrings_2.58.0 rtracklayer_1.49.5 xfun_0.22 stringr_1.4.0
[41] openxlsx_4.2.3 mime_0.10 lifecycle_1.0.0 gtools_3.8.2
[45] XML_3.99-0.5 AnnotationHub_2.22.1 zlibbioc_1.36.0 scales_1.1.1
[49] hms_1.0.0 promises_1.2.0.1 ProtGenerics_1.22.0 yaml_2.2.1
[53] curl_4.3 memoise_2.0.0 ggplot2_3.3.3 biomaRt_2.46.3
[57] stringi_1.5.3 RSQLite_2.2.4 BiocVersion_3.12.0 zip_2.1.1
[61] BiocParallel_1.24.1 GSA_1.03.1 rlang_0.4.10 pkgconfig_2.0.3
[65] bitops_1.0-6 evaluate_0.14 lattice_0.20-41 purrr_0.3.4
[69] GenomicAlignments_1.26.0 bit_4.0.4 tidyselect_1.1.0 plyr_1.8.6
[73] magrittr_2.0.1 R6_2.5.0 generics_0.1.0 DelayedArray_0.16.3
[77] DBI_1.1.1 pillar_1.6.0 withr_2.4.2 svMisc_1.1.4
[81] RCurl_1.98-1.3 tibble_3.1.0 crayon_1.4.1 shinyFiles_0.9.0
[85] utf8_1.1.4 BiocFileCache_1.14.0 rmarkdown_2.7 progress_1.2.2
[89] grid_4.0.4 blob_1.2.1 digest_0.6.27 xtable_1.8-4
[93] httpuv_1.5.5 openssl_1.4.3 munsell_0.5.0 askpass_1.1
Hi Michael,
Thank you for the rapid reply. I seem to be getting the same error with the development build as well:
However, as you suggested, using ModifiedEnsembl worked perfectly.
Thanks again
Jack.
Ok thanks for the report. I'll take a look at the code.
I think I do want to support people putting any string in
source
when they are building a linkedTxome but right now it seems "Ensembl" triggers an attempted download of the Ensembl txome via ensembldb. I need to do some more testing on my end.Thanks again for the report Jack.
I've added some more messaging and documentation about what happens with linkedTxome when source="Ensembl" or "GENCODE".
I think the current behavior is actually best, but more messaging will help users realize that if they've modified the GTF, then they want to avoid having tximeta think that ensembldb is the correct package for parsing it.
If you get a chance could you see on your end (with the new github code) that the messaging appears when you try to import the data with source="Ensembl"?
Hi Michael,
Thank you for your help. As requested I can see that when trying with source="Ensembl" on the newest github version (1.9.6) that the warning message does appear.
I hadn't initially realised any string could be used, I assumed from the docs it was only "Ensembl", "GENCODE" or "de-novo", so this warning message helps to clarify that in that instance.
Jack.
Thanks, and I realized that the documentation was also confusing, and have cleared that up in the new man page:
I also added this to the vignette: