Question

tximport no counts from abundance using EnsDb.Hsapiens.v86

0

Entering edit mode

TJ • 0

@a7167d98

Last seen 4.2 years ago

Hi,

I have generated quant.sf files using salmon for six samples (2 conditions, each condition in triplicate). I have then followed the tximport workflow and am stuck at the tx2gene step below as I get $countsFromAbundance [1] "no".

Many thanks in advance for your kind help and your time

> dir <- "/Volumes/Mac_LJ_TJ/TJ/RNA/salmon/salmon_tutorial/quants"
> dir
[1] "/Volumes/Mac_LJ_TJ/TJ/RNA/salmon/salmon_tutorial/quants"
> samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
> samples

  samples                   treatment
1 sample1 WTCHG_823938_71575133_quant
2 sample2 WTCHG_823938_71585134_quant
3 sample3 WTCHG_823938_71595135_quant
4 sample4 WTCHG_823938_71605136_quant
5 sample5 WTCHG_823938_71615137_quant
6 sample6 WTCHG_823938_71625138_quant

> files <- file.path(dir, "salmon", samples$treatment, "quant.sf")
> names(files) <- paste0("sample", 1:6)
> all(file.exists(files))
[1] TRUE
> edb <- EnsDb.Hsapiens.v86
> txs <- transcripts(edb, return.type = "DataFrame")
> txi <- tximport(files, type = "salmon", tx2gene = txs, ignoreTxVersion = TRUE)

reading in files with read_tsv
1 2 3 4 5 6 
transcripts missing from tx2gene: 17113
summarizing abundance
summarizing counts
summarizing length

> head(txi$counts)
                               sample1  sample2  sample3 sample4  sample5  sample6
3prime_overlapping_ncRNA         0.000    1.052    0.000    0.00    0.000    0.000
antisense                     3623.785 4702.966 3690.267 4604.10 4791.529 4241.543
bidirectional_promoter_lncRNA   80.000   95.001   67.000   36.01   48.146   40.001
IG_C_gene                        1.000    0.000    1.000    0.00    3.000    0.000
IG_C_pseudogene                  0.000    2.000    0.000    5.00    2.000    2.000
IG_D_gene                        0.000    0.000    0.000    0.00    0.000    0.000
> 
> txi

$counts
                                        sample1      sample2      sample3      sample4      sample5
3prime_overlapping_ncRNA                  0.000        1.052        0.000        0.000        0.000
antisense                              3623.785     4702.966     3690.267     4604.100     4791.529
bidirectional_promoter_lncRNA            80.000       95.001       67.000       36.010       48.146
IG_C_gene                                 1.000        0.000        1.000        0.000        3.000
IG_C_pseudogene                           0.000        2.000        0.000        5.000        2.000
IG_D_gene                                 0.000        0.000        0.000        0.000        0.000
IG_J_gene                                 0.000        0.000        0.000        0.000        0.000
IG_J_pseudogene                           0.000        0.000        0.000        0.000        0.000
IG_pseudogene                             0.000        0.000        0.000        0.000        0.000
IG_V_gene                               205.441      217.536      202.516      203.137      234.523
IG_V_pseudogene                          24.000       27.316       26.035       30.000       13.002
lincRNA                               11653.474    15322.133    11335.241    12584.395    13595.131
non_stop_decay                        12568.506    15920.272    12464.009    11477.102    13866.972
nonsense_mediated_decay             1403931.042  1707027.271  1358093.215  1436845.152  1659518.433
polymorphic_pseudogene                   78.955       92.044       72.319       71.573       65.095
processed_pseudogene                  92082.091   126241.017    86318.116   110677.782   138921.071
processed_transcript                1096540.715  1303838.063  1040602.805  1159030.527  1288276.075
protein_coding                     52869867.637 67005440.158 51747759.212 54946608.147 64955809.746
pseudogene                                1.000       12.000        0.000        5.000        2.000
retained_intron                     2155812.260  2209586.729  1980159.016  2322248.306  2343544.031
rRNA                                    287.000      576.000      324.000     1320.000     1533.000
sense_intronic                         1990.255     2750.951     2047.189     2868.783     3277.976
sense_overlapping                      2038.084     3315.309     2158.184     2604.718     2920.475
TEC                                    3667.636     3851.902     3238.114     4082.728     3961.662
TR_C_gene                                 0.000        0.000        1.000        0.000        0.000
TR_D_gene                                 0.000        0.000        0.000        0.000        0.000
TR_J_gene                                 0.000        0.000        0.000        0.000        0.000
TR_J_pseudogene                           0.000        0.000        0.000        0.000        0.000
TR_V_gene                                25.000       19.000       20.000       21.000       28.000
TR_V_pseudogene                           9.000       13.000       13.000       10.000        6.000
transcribed_processed_pseudogene       9987.584    12754.142     9282.894    10549.953    12358.831
transcribed_unitary_pseudogene           66.438      147.927      146.285      203.033      124.679
transcribed_unprocessed_pseudogene    18345.868    20242.087    18016.170    19232.998    19849.919
unitary_pseudogene                      272.691      380.859      245.403      225.672      228.680
unprocessed_pseudogene                25769.595    24008.149    25430.889    28503.149    27337.853
                                        sample6
3prime_overlapping_ncRNA                  0.000
antisense                              4241.543
bidirectional_promoter_lncRNA            40.001
IG_C_gene                                 0.000
IG_C_pseudogene                           2.000
IG_D_gene                                 0.000
IG_J_gene                                 0.000
IG_J_pseudogene                           0.000
IG_pseudogene                             0.000
IG_V_gene                               195.168
IG_V_pseudogene                          15.968
lincRNA                               11456.067
non_stop_decay                        11973.694
nonsense_mediated_decay             1443531.471
polymorphic_pseudogene                   44.688
processed_pseudogene                 120897.648
processed_transcript                1109199.937
protein_coding                     56409697.402
pseudogene                                3.000
retained_intron                     1876370.615
rRNA                                   1539.000
sense_intronic                         2854.213
sense_overlapping                      2553.777
TEC                                    3149.544
TR_C_gene                                 2.000
TR_D_gene                                 0.000
TR_J_gene                                 0.000
TR_J_pseudogene                           0.000
TR_V_gene                                26.000
TR_V_pseudogene                           9.000
transcribed_processed_pseudogene      11716.976
transcribed_unitary_pseudogene          131.237
transcribed_unprocessed_pseudogene    17375.298
unitary_pseudogene                      149.448
unprocessed_pseudogene                20904.643


$countsFromAbundance
[1] "no"

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.18.0           EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.14.0         
 [4] AnnotationFilter_1.14.0   GenomicFeatures_1.42.1    AnnotationDbi_1.52.0     
 [7] Biobase_2.50.0            GenomicRanges_1.42.0      GenomeInfoDb_1.26.2      
[10] IRanges_2.24.1            S4Vectors_0.28.1          BiocGenerics_0.36.0      
[13] tximportData_1.18.0      

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.2.0        httr_1.4.2                  bit64_4.0.5                
 [4] jsonlite_1.7.2              assertthat_0.2.1            askpass_1.1                
 [7] BiocFileCache_1.14.0        blob_1.2.1                  GenomeInfoDbData_1.2.4     
[10] Rsamtools_2.6.0             progress_1.2.2              sessioninfo_1.1.1          
[13] pillar_1.4.7                RSQLite_2.2.3               lattice_0.20-41            
[16] glue_1.4.2                  XVector_0.30.0              Matrix_1.3-2               
[19] XML_3.99-0.5                pkgconfig_2.0.3             biomaRt_2.46.2             
[22] zlibbioc_1.36.0             purrr_0.3.4                 BiocParallel_1.24.1        
[25] tibble_3.0.6                openssl_1.4.3               generics_0.1.0             
[28] ellipsis_0.3.1              cachem_1.0.1                withr_2.4.1                
[31] SummarizedExperiment_1.20.0 lazyeval_0.2.2              cli_2.2.0                  
[34] magrittr_2.0.1              crayon_1.3.4                memoise_2.0.0              
[37] fansi_0.4.2                 xml2_1.3.2                  tools_4.0.3                
[40] prettyunits_1.1.1           hms_1.0.0                   lifecycle_0.2.0            
[43] matrixStats_0.57.0          stringr_1.4.0               DelayedArray_0.16.1        
[46] Biostrings_2.58.0           compiler_4.0.3              tinytex_0.29               
[49] rlang_0.4.10                grid_4.0.3                  RCurl_1.98-1.2             
[52] rstudioapi_0.13             rappdirs_0.3.2              bitops_1.0-6               
[55] DBI_1.1.1                   curl_4.3                    R6_2.5.0                   
[58] GenomicAlignments_1.26.0    dplyr_1.0.3                 rtracklayer_1.50.0         
[61] fastmap_1.1.0               bit_4.0.4                   ProtGenerics_1.22.0        
[64] readr_1.4.0                 stringi_1.5.3               Rcpp_1.0.6                 
[67] vctrs_0.3.6                 dbplyr_2.0.0                tidyselect_1.1.0           
[70] xfun_0.20

Note following the txi command I removed $abundance and $length for this post for space reasons.

salmon EnsDb.Hsapiens.v86 tximport • 1.2k views

ADD COMMENT • link updated 4.2 years ago by James W. MacDonald 68k • written 4.2 years ago by TJ • 0

score 0 · Answer 1 · 2021-02-01

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

There is no error. See the man page for ?tximport and the information under Value, which is what the function returns.

ADD COMMENT • link 4.2 years ago Michael Love 43k