Entering edit mode
TJ
•
0
@a7167d98
Last seen 4.0 years ago
Hi,
I have generated quant.sf files using salmon for six samples (2 conditions, each condition in triplicate). I have then followed the tximport workflow and am stuck at the tx2gene step below as I get $countsFromAbundance [1] "no".
Many thanks in advance for your kind help and your time
> dir <- "/Volumes/Mac_LJ_TJ/TJ/RNA/salmon/salmon_tutorial/quants"
> dir
[1] "/Volumes/Mac_LJ_TJ/TJ/RNA/salmon/salmon_tutorial/quants"
> samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
> samples
samples treatment
1 sample1 WTCHG_823938_71575133_quant
2 sample2 WTCHG_823938_71585134_quant
3 sample3 WTCHG_823938_71595135_quant
4 sample4 WTCHG_823938_71605136_quant
5 sample5 WTCHG_823938_71615137_quant
6 sample6 WTCHG_823938_71625138_quant
> files <- file.path(dir, "salmon", samples$treatment, "quant.sf")
> names(files) <- paste0("sample", 1:6)
> all(file.exists(files))
[1] TRUE
> edb <- EnsDb.Hsapiens.v86
> txs <- transcripts(edb, return.type = "DataFrame")
> txi <- tximport(files, type = "salmon", tx2gene = txs, ignoreTxVersion = TRUE)
reading in files with read_tsv
1 2 3 4 5 6
transcripts missing from tx2gene: 17113
summarizing abundance
summarizing counts
summarizing length
> head(txi$counts)
sample1 sample2 sample3 sample4 sample5 sample6
3prime_overlapping_ncRNA 0.000 1.052 0.000 0.00 0.000 0.000
antisense 3623.785 4702.966 3690.267 4604.10 4791.529 4241.543
bidirectional_promoter_lncRNA 80.000 95.001 67.000 36.01 48.146 40.001
IG_C_gene 1.000 0.000 1.000 0.00 3.000 0.000
IG_C_pseudogene 0.000 2.000 0.000 5.00 2.000 2.000
IG_D_gene 0.000 0.000 0.000 0.00 0.000 0.000
>
> txi
$counts
sample1 sample2 sample3 sample4 sample5
3prime_overlapping_ncRNA 0.000 1.052 0.000 0.000 0.000
antisense 3623.785 4702.966 3690.267 4604.100 4791.529
bidirectional_promoter_lncRNA 80.000 95.001 67.000 36.010 48.146
IG_C_gene 1.000 0.000 1.000 0.000 3.000
IG_C_pseudogene 0.000 2.000 0.000 5.000 2.000
IG_D_gene 0.000 0.000 0.000 0.000 0.000
IG_J_gene 0.000 0.000 0.000 0.000 0.000
IG_J_pseudogene 0.000 0.000 0.000 0.000 0.000
IG_pseudogene 0.000 0.000 0.000 0.000 0.000
IG_V_gene 205.441 217.536 202.516 203.137 234.523
IG_V_pseudogene 24.000 27.316 26.035 30.000 13.002
lincRNA 11653.474 15322.133 11335.241 12584.395 13595.131
non_stop_decay 12568.506 15920.272 12464.009 11477.102 13866.972
nonsense_mediated_decay 1403931.042 1707027.271 1358093.215 1436845.152 1659518.433
polymorphic_pseudogene 78.955 92.044 72.319 71.573 65.095
processed_pseudogene 92082.091 126241.017 86318.116 110677.782 138921.071
processed_transcript 1096540.715 1303838.063 1040602.805 1159030.527 1288276.075
protein_coding 52869867.637 67005440.158 51747759.212 54946608.147 64955809.746
pseudogene 1.000 12.000 0.000 5.000 2.000
retained_intron 2155812.260 2209586.729 1980159.016 2322248.306 2343544.031
rRNA 287.000 576.000 324.000 1320.000 1533.000
sense_intronic 1990.255 2750.951 2047.189 2868.783 3277.976
sense_overlapping 2038.084 3315.309 2158.184 2604.718 2920.475
TEC 3667.636 3851.902 3238.114 4082.728 3961.662
TR_C_gene 0.000 0.000 1.000 0.000 0.000
TR_D_gene 0.000 0.000 0.000 0.000 0.000
TR_J_gene 0.000 0.000 0.000 0.000 0.000
TR_J_pseudogene 0.000 0.000 0.000 0.000 0.000
TR_V_gene 25.000 19.000 20.000 21.000 28.000
TR_V_pseudogene 9.000 13.000 13.000 10.000 6.000
transcribed_processed_pseudogene 9987.584 12754.142 9282.894 10549.953 12358.831
transcribed_unitary_pseudogene 66.438 147.927 146.285 203.033 124.679
transcribed_unprocessed_pseudogene 18345.868 20242.087 18016.170 19232.998 19849.919
unitary_pseudogene 272.691 380.859 245.403 225.672 228.680
unprocessed_pseudogene 25769.595 24008.149 25430.889 28503.149 27337.853
sample6
3prime_overlapping_ncRNA 0.000
antisense 4241.543
bidirectional_promoter_lncRNA 40.001
IG_C_gene 0.000
IG_C_pseudogene 2.000
IG_D_gene 0.000
IG_J_gene 0.000
IG_J_pseudogene 0.000
IG_pseudogene 0.000
IG_V_gene 195.168
IG_V_pseudogene 15.968
lincRNA 11456.067
non_stop_decay 11973.694
nonsense_mediated_decay 1443531.471
polymorphic_pseudogene 44.688
processed_pseudogene 120897.648
processed_transcript 1109199.937
protein_coding 56409697.402
pseudogene 3.000
retained_intron 1876370.615
rRNA 1539.000
sense_intronic 2854.213
sense_overlapping 2553.777
TEC 3149.544
TR_C_gene 2.000
TR_D_gene 0.000
TR_J_gene 0.000
TR_J_pseudogene 0.000
TR_V_gene 26.000
TR_V_pseudogene 9.000
transcribed_processed_pseudogene 11716.976
transcribed_unitary_pseudogene 131.237
transcribed_unprocessed_pseudogene 17375.298
unitary_pseudogene 149.448
unprocessed_pseudogene 20904.643
$countsFromAbundance
[1] "no"
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximport_1.18.0 EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.14.0
[4] AnnotationFilter_1.14.0 GenomicFeatures_1.42.1 AnnotationDbi_1.52.0
[7] Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.2
[10] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.0
[13] tximportData_1.18.0
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.2.0 httr_1.4.2 bit64_4.0.5
[4] jsonlite_1.7.2 assertthat_0.2.1 askpass_1.1
[7] BiocFileCache_1.14.0 blob_1.2.1 GenomeInfoDbData_1.2.4
[10] Rsamtools_2.6.0 progress_1.2.2 sessioninfo_1.1.1
[13] pillar_1.4.7 RSQLite_2.2.3 lattice_0.20-41
[16] glue_1.4.2 XVector_0.30.0 Matrix_1.3-2
[19] XML_3.99-0.5 pkgconfig_2.0.3 biomaRt_2.46.2
[22] zlibbioc_1.36.0 purrr_0.3.4 BiocParallel_1.24.1
[25] tibble_3.0.6 openssl_1.4.3 generics_0.1.0
[28] ellipsis_0.3.1 cachem_1.0.1 withr_2.4.1
[31] SummarizedExperiment_1.20.0 lazyeval_0.2.2 cli_2.2.0
[34] magrittr_2.0.1 crayon_1.3.4 memoise_2.0.0
[37] fansi_0.4.2 xml2_1.3.2 tools_4.0.3
[40] prettyunits_1.1.1 hms_1.0.0 lifecycle_0.2.0
[43] matrixStats_0.57.0 stringr_1.4.0 DelayedArray_0.16.1
[46] Biostrings_2.58.0 compiler_4.0.3 tinytex_0.29
[49] rlang_0.4.10 grid_4.0.3 RCurl_1.98-1.2
[52] rstudioapi_0.13 rappdirs_0.3.2 bitops_1.0-6
[55] DBI_1.1.1 curl_4.3 R6_2.5.0
[58] GenomicAlignments_1.26.0 dplyr_1.0.3 rtracklayer_1.50.0
[61] fastmap_1.1.0 bit_4.0.4 ProtGenerics_1.22.0
[64] readr_1.4.0 stringi_1.5.3 Rcpp_1.0.6
[67] vctrs_0.3.6 dbplyr_2.0.0 tidyselect_1.1.0
[70] xfun_0.20
Note following the txi command I removed $abundance and $length for this post for space reasons.