Missing transcript
1
0
Entering edit mode
@6db15c42
Last seen 18 months ago
Japan

Hello, I'm trying to run tximport from my input salmon, the generated tx2gene file had 1,000+ genes but when I ran tximport I got missing transcripts with only RNAs are readable (all the other genes are missing). I could not find the reason why the tximport produce such output. How do I make my other genes visible? Thanks


library(GenomicFeatures)
gff_file <- "tn2-sequence.gff3"
file.exists(gff_file)
txdb <- makeTxDbFromGFF(gff_file)
keytypes(txdb)
columns(txdb)

#gene names to transcript
k <- keys(txdb, keytype="TXNAME")
tx_map <- AnnotationDbi::select(txdb, keys = k, 
                                columns="GENEID", keytype = "TXNAME")
view(tx_map) 
tx2gene <- tx_map
write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE)
view (tx2gene)

--tx2gene generates 1278obs of 2 variables

##load transcript abundances -------
txi <- tximport(files = sample_files, type = "salmon", 
         tx2gene = tx2gene, ignoreTxVersion = TRUE)
view(txi$counts)

# results 
reading in files with read_tsv
1 2 3 4 5 
removing duplicated transcript rows from tx2gene
transcripts missing from tx2gene: 1545
summarizing abundance
summarizing counts
summarizing length

sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.2          stringr_1.5.0          dplyr_1.0.10          
 [4] purrr_1.0.0            readr_2.1.3            tidyr_1.2.1           
 [7] tibble_3.1.8           ggplot2_3.4.0          tidyverse_1.3.2       
[10] BiocManager_1.30.19    tximport_1.26.1        GenomicFeatures_1.50.3
[13] AnnotationDbi_1.60.0   Biobase_2.58.0         GenomicRanges_1.50.2  
[16] GenomeInfoDb_1.34.6    IRanges_2.32.0         S4Vectors_0.36.1      
[19] BiocGenerics_0.44.0    readxl_1.4.1           magrittr_2.0.3        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                matrixStats_0.63.0         
 [3] fs_1.5.2                    lubridate_1.9.0            
 [5] bit64_4.0.5                 filelock_1.0.2             
 [7] progress_1.2.2              httr_1.4.4                 
 [9] tools_4.2.2                 backports_1.4.1            
[11] utf8_1.2.2                  R6_2.5.1                   
[13] DBI_1.1.3                   colorspace_2.0-3           
[15] withr_2.5.0                 tidyselect_1.2.0           
[17] prettyunits_1.1.1           bit_4.0.5                  
[19] curl_4.3.3                  compiler_4.2.2             
[21] rvest_1.0.3                 cli_3.5.0                  
[23] xml2_1.3.3                  DelayedArray_0.24.0        
[25] rtracklayer_1.58.0          scales_1.2.1               
[27] rappdirs_0.3.3              digest_0.6.31              
[29] Rsamtools_2.14.0            XVector_0.38.0             
[31] pkgconfig_2.0.3             MatrixGenerics_1.10.0      
[33] dbplyr_2.2.1                fastmap_1.1.0              
[35] rlang_1.0.6                 rstudioapi_0.14            
[37] RSQLite_2.2.20              BiocIO_1.8.0               
[39] generics_0.1.3              jsonlite_1.8.4             
[41] vroom_1.6.0                 BiocParallel_1.32.5        
[43] googlesheets4_1.0.1         RCurl_1.98-1.9             
[45] GenomeInfoDbData_1.2.9      Matrix_1.5-3               
[47] Rcpp_1.0.9                  munsell_0.5.0              
[49] fansi_1.0.3                 lifecycle_1.0.3            
[51] stringi_1.7.8               yaml_2.3.6                 
[53] SummarizedExperiment_1.28.0 zlibbioc_1.44.0            
[55] BiocFileCache_2.6.0         grid_4.2.2                 
[57] blob_1.2.3                  parallel_4.2.2             
[59] crayon_1.5.2                lattice_0.20-45            
[61] Biostrings_2.66.0           haven_2.5.1                
[63] hms_1.1.2                   KEGGREST_1.38.0            
[65] pillar_1.8.1                rjson_0.2.21               
[67] codetools_0.2-18            biomaRt_2.54.0             
[69] reprex_2.0.2                XML_3.99-0.13              
[71] glue_1.6.2                  modelr_0.1.10              
[73] data.table_1.14.6           tzdb_0.3.0                 
[75] png_0.1-8                   vctrs_0.5.1                
[77] cellranger_1.1.0            gtable_0.3.1               
[79] assertthat_0.2.1            cachem_1.0.6               
[81] broom_1.0.2                 restfulr_0.0.15            
[83] googledrive_2.0.0           gargle_1.2.1               
[85] GenomicAlignments_1.34.0    memoise_2.0.1              
[87] timechange_0.1.1            ellipsis_0.3.2
DESeq2 tximport salmon • 1.1k views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 5 hours ago
Germany

This is not a tximport problem. If there are missing transcripts then there is a mismatch betwwen your salmon index (or the fasta you used as reference) and this gff file, but this is upstream of tximport.

ADD COMMENT

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6