Txtimport for kallisto on hg38 v87
1
0
Entering edit mode
@jarod_v6liberoit-6654
Last seen 5.8 years ago
Italy

I try to import the kallisto results on R

I use ensembl vs87 . On bioconductor I found the version 86.

$ cat abundance.tsv |head
target_id    length    eff_length    est_counts    tpm
ENST00000448914.1    13    12    0    0
ENST00000631435.1    12    11    0    0
ENST00000632684.1    12    11    0    0
ENST00000434970.2    9    8    0    0
ENST00000415118.1    8    7    0    0
ENST00000633010.1    16    15    0    0

 

Tx <- transcripts(txdb,return.type="DataFrame")
tx2gene <- as.data.frame(Tx[,c("tx_id","gene_id")])
> txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, reader = read_tsv)
reading in files
1 Parsed with column specification:
cols(
  target_id = col_character(),
  length = col_integer(),
  eff_length = col_double(),
  est_counts = col_double(),
  tpm = col_double()
)

Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

 sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readr_1.1.0                tximport_1.0.3             EnsDb.Hsapiens.v86_2.1.0  
 [4] BiocInstaller_1.22.3       ensembldb_1.4.7            GenomicFeatures_1.24.5    
 [7] AnnotationDbi_1.34.4       tximportData_1.0.2         VariantAnnotation_1.18.7  
[10] Rsamtools_1.24.0           Biostrings_2.40.2          XVector_0.12.1            
[13] SummarizedExperiment_1.2.3 Biobase_2.32.0             GenomicRanges_1.24.3      
[16] GenomeInfoDb_1.8.7         IRanges_2.6.1              S4Vectors_0.10.3          
[19] BiocGenerics_0.18.0        circlize_0.3.9            

loaded via a namespace (and not attached):
 [1] shape_1.4.2                   colorspace_1.3-2              htmltools_0.3.5              
 [4] rtracklayer_1.32.2            yaml_2.1.14                   base64enc_0.1-3              
 [7] interactiveDisplayBase_1.10.3 XML_3.98-1.5                  DBI_0.6-1                    
[10] BiocParallel_1.6.6            plyr_1.8.4                    stringr_1.2.0                
[13] zlibbioc_1.18.0               munsell_0.4.3                 gtable_0.2.0                 
[16] GlobalOptions_0.0.10          QoRTs_1.1.8                   memoise_1.0.0                
[19] evaluate_0.10                 knitr_1.15.1                  biomaRt_2.28.0               
[22] httpuv_1.3.3                  Rcpp_0.12.9                   xtable_1.8-2                 
[25] backports_1.0.5               scales_0.4.1                  BSgenome_1.40.1              
[28] jsonlite_1.2                  mime_0.5                      AnnotationHub_2.4.2          
[31] hms_0.3                       digest_0.6.12                 stringi_1.1.2                
[34] dplyr_0.5.0                   shiny_1.0.0                   grid_3.3.2                   
[37] rprojroot_1.2                 tools_3.3.2                   bitops_1.0-6                 
[40] magrittr_1.5                  tibble_1.2                    RCurl_1.95-4.8               
[43] lazyeval_0.2.0                RSQLite_1.1-2                 assertthat_0.1               
[46] rmarkdown_1.3                 httr_1.2.1                    R6_2.2.0                     
[49] GenomicAlignments_1.8.4

 

 

txtimport deseq2 ensembl release • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 24 minutes ago
United States

There are likely to be some changes beween 86 and 87 (otherwise Ensembl wouldn't release a new version!). But it won't be ALL of the genes. The problem you are facing is that the Ensembl transcript IDs you have include the transcript version (e.g., ENST00000448914.1), whereas the annotation package won't include the version (e.g., it will be ENST00000448914), so you need to strip off the extraneous version indicators.

You can also get a gtf for Ensembl 87 from AnnotationHub, and make your own TxDb package, if you want to match Ensembl build versions.

ADD COMMENT
0
Entering edit mode

Note there is an argument in tximport to strip the version number for you. Jarod, see ?tximport

ADD REPLY

Login before adding your answer.

Traffic: 889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6