Hello, I am trying to import quant.sf files from salmon summarized to transcript level (ignoring tx version). I am using tximport and have used it before successfully, but for some reason- which I am unable to figure out, it doesn't seem to ignore the txversion this time.
Here is what my annotation file looks like
> head(tx2gene)
TXNAME GENEID
1 ENST00000387314 MT-TF
2 ENST00000389680 MT-RNR1
3 ENST00000387342 MT-TV
4 ENST00000387347 MT-RNR2
5 ENST00000386347 MT-TL1
6 ENST00000361390 MT-ND1
Here is what my quant.sf file looks like
# A tibble: 211,939 x 5
Name Length EffectiveLength TPM NumReads
<chr> <dbl> <dbl> <dbl> <dbl>
1 ENST00000632684.1 12 3 0 0
2 ENST00000434970.2 9 2 0 0
3 ENST00000448914.1 13 3 0 0
4 ENST00000415118.1 8 2 0 0
5 ENST00000390583.1 31 3 0 0
6 ENST00000390577.1 37 3 0 0
7 ENST00000451044.1 17 3 0 0
8 ENST00000390578.1 31 3 0 0
9 ENST00000390572.1 28 3 0 0
10 ENST00000632859.1 21 3 0 0
# ... with 211,929 more rows
txi.tx <- tximport(files, type = "salmon", txOut = TRUE, tx2gene = tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)
txi.tx$counts[1:2,1:2]
S4 S3
ENST00000632684.1 0 0
ENST00000434970.2 0 0
ENST00000448914.1 0 0
ENST00000415118.1 0 0
Any pointers to what I may be missing will be greatly appreciated. Thanks, Kavitha
P.s > sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server >= 2012 x64 (build 9200)
Matrix products: default
locale:
[1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C
[5] LCTIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] dplyr0.8.3 rjson0.2.20 readr1.3.1 rhdf52.28.0 biomaRt2.40.4 tximport1.12.3
loaded via a namespace (and not attached):
[1] Rcpp1.0.2 pillar1.4.2 compiler3.6.1 prettyunits1.0.2 bitops1.0-6 tools3.6.1 progress1.2.2
[8] zeallot0.1.0 digest0.6.20 bit1.1-14 jsonlite1.6 RSQLite2.1.2 memoise1.1.0 tibble2.1.3
[15] pkgconfig2.0.2 rlang0.4.0 cli1.1.0 DBI1.0.0 rstudioapi0.10 curl4.1 parallel3.6.1
[22] stringr1.4.0 httr1.4.1 vctrs0.2.0 S4Vectors0.22.1 IRanges2.18.2 hms0.5.1 tidyselect0.2.5
[29] stats43.6.1 bit640.9-7 glue1.3.1 Biobase2.44.0 R62.4.0 fansi0.4.0 AnnotationDbi1.46.1
[36] XML3.98-1.20 purrr0.3.2 Rhdf5lib1.6.1 blob1.2.0 magrittr1.5 backports1.1.4 BiocGenerics0.30.0
[43] assertthat0.2.1 utf81.1.4 stringi1.4.3 RCurl1.95-4.12 crayon_1.3.4
I tried that first, in fact. It fails then too.
Why do you set
txOut=TRUE
here if you want gene-level output?Hello Michael, I am trying to summarize it to transcript level not gene level... Oh! I see why you are asking.
I had summarize to gene level earlier and I was just trying to see if they add up. Please feel free to ignore the summarizeToGene code :)
I have edited my question to show what txi.tx$counts looks like.
Ok I was confused by that code and thinking that was the error you were after.
The arguments do not modify the names of the rows. They just help with summarization. So this is expected behavior.
Oh! I see. It would be real helpful if this can be explicitly stated in the future releases of the vignette :) (Just a thought). Thanks :)
Good point. I just added this.
I tried that first, in fact. It fails then too.