I have 20 paired end libraries pseudomapped to the rat transcriptome with Salmon and I wish to create a tx2gene dataframe via the ensembldb package for downstream analysis in deseq2 as recommended here:
Note: if you are using an Ensembl transcriptome, the easiest way to create the tx2gene
data.frame is to use the ensembldb packages. The annotation packages can be found by version number, and use the pattern EnsDb.Hsapiens.vXX
. The transcripts
function can be used with return.type="DataFrame"
, in order to obtain something like the df
object constructed in the code chunk above. See the ensembldb package vignette for more details.
however, while biocLite("EnsDb.Hsapiens.v75") works fine, biocLite("EnsDb.Rnorvegicus.v89") returns: Warning message: package 'EnsDb.Rnorvegicus.v89' is not available (for R version 3.4.0)
Is this a case of trying to use the wrong tool, i.e these recommendations apply to human data but not other species... or some other issue? Would BioMart help?
Thanks! Aaron
Hi, I have a question, the output of kallisto's transctrip name is
Do you know how to solve this problem? Thanks a lot for your time.
You have to remove the transcript version number from the transcript IDs (i.e. the .1). Just be sure that the Ensembl version of the
EnsDb
you are using and the version that was used for kallisto match.A fast way to remove them is e.g.
top_table$tx_id <- sub("\\.[0-9]*$", "", top_table$tx_id)
FYI: Johannes' solution is automagically performed within the
tximport
function when specifying the argumentignoreTxVersion = TRUE
(default =FALSE
).Check the help page for ?tximport. You can ignore version numbers