Entering edit mode
Using airway2
(https://github.com/mikelove/airway2/tree/master/inst) salmon transcript level count files, this is supposed to remove the versioning of Ensemble gene IDs:
library(tximeta)
srrs = list.files(path = "~/Downloads/airway2-master/inst/extdata/quants/", full.names = TRUE)
txm_raw = tximeta(file.path(srrs, "quant.sf.gz"), type = "salmon")
txm_con = summarizeToGene(txm_raw, ignoreTxVersion = TRUE)
loading existing TxDb created: 2019-07-02 20:01:47
obtaining transcript-to-gene mapping from TxDb
Error in .local(object, ...) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
Example IDs (file): [ENST00000456328, ENST00000450305, ENST00000488147, ...]
Example IDs (tx2gene): [ENST00000456328.2, ENST00000450305.2, ENST00000473358.1, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.
Similarly, the operation fails using mapIds
of AnnotationDbi
(for which it might be nice to have as well an option like ignoreTxVersion
).
Current work-around:
names(GRanges_object) <- sapply(names(GRanges_object), function(x) unlist(strsplit(x, "\\."))[[1]])
Session Info:
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] GenomicFeatures_1.36.3 AnnotationDbi_1.46.0 Biobase_2.44.0
[4] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 IRanges_2.18.1
[7] S4Vectors_0.22.0 BiocGenerics_0.30.0 tximeta_1.2.1
loaded via a namespace (and not attached):
[1] httr_1.4.0 tidyr_0.8.3 vsn_3.52.0
[4] bit64_0.9-7 jsonlite_1.6 assertthat_0.2.1
[7] BiocManager_1.30.4 affy_1.62.0 BiocFileCache_1.8.0
[10] blob_1.1.1 GenomeInfoDbData_1.2.1 Rsamtools_2.0.0
[13] progress_1.2.2 pillar_1.4.2 RSQLite_2.1.1
[16] lattice_0.20-38 limma_3.40.2 glue_1.3.1
[19] digest_0.6.19 XVector_0.24.0 colorspace_1.4-1
[22] preprocessCore_1.46.0 Matrix_1.2-17 XML_3.98-1.20
[25] pkgconfig_2.0.2 biomaRt_2.40.1 zlibbioc_1.30.0
[28] purrr_0.3.2 scales_1.0.0 affyio_1.54.0
[31] BiocParallel_1.18.0 tibble_2.1.3 AnnotationFilter_1.8.0
[34] ggplot2_3.2.0 SummarizedExperiment_1.14.0 lazyeval_0.2.2
[37] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
[40] tools_3.6.0 prettyunits_1.0.2 hms_0.4.2
[43] matrixStats_0.54.0 stringr_1.4.0 munsell_0.5.0
[46] DelayedArray_0.10.0 ensembldb_2.8.0 Biostrings_2.52.0
[49] compiler_3.6.0 rlang_0.4.0 grid_3.6.0
[52] RCurl_1.95-4.12 tximport_1.12.3 rstudioapi_0.10
[55] rappdirs_0.3.1 bitops_1.0-6 gtable_0.3.0
[58] DBI_1.0.0 curl_3.3 R6_2.4.0
[61] GenomicAlignments_1.20.1 knitr_1.23 dplyr_0.8.2
[64] rtracklayer_1.43.3 bit_1.1-14 ProtGenerics_1.16.0
[67] readr_1.3.1 stringi_1.4.3 Rcpp_1.0.1
[70] dbplyr_1.4.2 tidyselect_0.2.5 xfun_0.8
Thank you, Michael. I didn't know about the
tximeta::addIds()
function.A minor point with regards to this solution:
Given, I've run a
DESeq2
analysis ongse
(ortxm
) and I would like to add the ENTREZID afterwards, theaddIds(dds_results, "ENTREZID")
function will not work with an object ofDESeqResults
class (even ifmetadata(txm)$txomeInfo -> metadata(dds_results)$txomeInfo
. I can imagine this holds true for a variety of objects derived fromSummarizedExperiments
that end up asdata.frame
or similar.In this case, I still would have to do something like
and given that
row.names(dds_results)
are versioned, this would throw an error.Of course, one can work around as specified in the original question by stripping off the versioning ... however, it would be nice to harmonize
tximeta::addIds
andAnnotationDbi::mapIds
for such cases. Maybe a good feature request forAnnotationDbi
? (Or a request to allow mappings viaaddIds()
for any vector given a validmetadata(obj)$txomeInfo
?)DESeqResults is a DataFrame, coming from the S4Vectors package.
So far, I've just built out tximeta functions for SummarizedExperiments, and so one could add the IDs upstream of DESeq2 or I think on a DESeqDataSet.
I can look into changing the code to support GRanges and/or DataFrames.