Question

Txtimport for kallisto on hg38 v87

0

Entering edit mode

jarod_v6@libero.it ▴ 40

@jarod_v6liberoit-6654

Last seen 6.2 years ago

Italy

I try to import the kallisto results on R

I use ensembl vs87 . On bioconductor I found the version 86.

$ cat abundance.tsv |head
target_id    length    eff_length    est_counts    tpm
ENST00000448914.1    13    12    0    0
ENST00000631435.1    12    11    0    0
ENST00000632684.1    12    11    0    0
ENST00000434970.2    9    8    0    0
ENST00000415118.1    8    7    0    0
ENST00000633010.1    16    15    0    0

Tx <- transcripts(txdb,return.type="DataFrame")
tx2gene <- as.data.frame(Tx[,c("tx_id","gene_id")])
> txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, reader = read_tsv)
reading in files
1 Parsed with column specification:
cols(
  target_id = col_character(),
  length = col_integer(),
  eff_length = col_double(),
  est_counts = col_double(),
  tpm = col_double()
)

Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

 sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readr_1.1.0                tximport_1.0.3             EnsDb.Hsapiens.v86_2.1.0  
 [4] BiocInstaller_1.22.3       ensembldb_1.4.7            GenomicFeatures_1.24.5    
 [7] AnnotationDbi_1.34.4       tximportData_1.0.2         VariantAnnotation_1.18.7  
[10] Rsamtools_1.24.0           Biostrings_2.40.2          XVector_0.12.1            
[13] SummarizedExperiment_1.2.3 Biobase_2.32.0             GenomicRanges_1.24.3      
[16] GenomeInfoDb_1.8.7         IRanges_2.6.1              S4Vectors_0.10.3          
[19] BiocGenerics_0.18.0        circlize_0.3.9            

loaded via a namespace (and not attached):
 [1] shape_1.4.2                   colorspace_1.3-2              htmltools_0.3.5              
 [4] rtracklayer_1.32.2            yaml_2.1.14                   base64enc_0.1-3              
 [7] interactiveDisplayBase_1.10.3 XML_3.98-1.5                  DBI_0.6-1                    
[10] BiocParallel_1.6.6            plyr_1.8.4                    stringr_1.2.0                
[13] zlibbioc_1.18.0               munsell_0.4.3                 gtable_0.2.0                 
[16] GlobalOptions_0.0.10          QoRTs_1.1.8                   memoise_1.0.0                
[19] evaluate_0.10                 knitr_1.15.1                  biomaRt_2.28.0               
[22] httpuv_1.3.3                  Rcpp_0.12.9                   xtable_1.8-2                 
[25] backports_1.0.5               scales_0.4.1                  BSgenome_1.40.1              
[28] jsonlite_1.2                  mime_0.5                      AnnotationHub_2.4.2          
[31] hms_0.3                       digest_0.6.12                 stringi_1.1.2                
[34] dplyr_0.5.0                   shiny_1.0.0                   grid_3.3.2                   
[37] rprojroot_1.2                 tools_3.3.2                   bitops_1.0-6                 
[40] magrittr_1.5                  tibble_1.2                    RCurl_1.95-4.8               
[43] lazyeval_0.2.0                RSQLite_1.1-2                 assertthat_0.1               
[46] rmarkdown_1.3                 httr_1.2.1                    R6_2.2.0                     
[49] GenomicAlignments_1.8.4

txtimport deseq2 ensembl release • 1.6k views

ADD COMMENT • link updated 8.1 years ago by James W. MacDonald 68k • written 8.1 years ago by jarod_v6@libero.it ▴ 40

score 0 · Answer 1 · 2017-04-07

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

There are likely to be some changes beween 86 and 87 (otherwise Ensembl wouldn't release a new version!). But it won't be ALL of the genes. The problem you are facing is that the Ensembl transcript IDs you have include the transcript version (e.g., ENST00000448914.1), whereas the annotation package won't include the version (e.g., it will be ENST00000448914), so you need to strip off the extraneous version indicators.

You can also get a gtf for Ensembl 87 from AnnotationHub, and make your own TxDb package, if you want to match Ensembl build versions.

ADD COMMENT • link 8.1 years ago James W. MacDonald 68k

0

Entering edit mode

Note there is an argument in tximport to strip the version number for you. Jarod, see ?tximport

ADD REPLY • link 8.1 years ago Michael Love 43k