Using tximport for kallisto aligned TOIL data
1
0
Entering edit mode
Nicholas • 0
@3611f731
Last seen 2 days ago
United States

I am attempting to do differential gene expression analysis on kallisto aligned data from the TOIL project. I want to use tximport to summarize the transcript level data to the gene level. The format of the abundance and count files is a matrix with ENST transcript IDs as rows and sample names as columns. I am wondering how I can use tximport to summarize these transcripts to the gene level given that the data is not in the classic kallisto format. If it is not possible to use tximport, how should I summarize the transcript IDs to gene names?

kallisto tximport TOIL • 270 views
ADD COMMENT
0
Entering edit mode

recount 2 does not use kallisto, does it? And recount does offer genelevel counts, so what's the point?

ADD REPLY
0
Entering edit mode

Sorry, I misspoke. The data was from the TOIL project and was aligned using kallisto. It was accessed from the UCSC XENA browser. I updated my post with these corrections.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

If you just have the transcript data, you probably don't need (and maybe cannot use) tximport. Instead you could just do the naive thing and average. You will first have to construct a data.frame that has transcripts in one column and genes in another. This isn't trivial because Ensembl is always updating things, so unless you know the transcriptome version used, you will have to iterate through various EnsDb versions on the AnnotationHub in order to find the right one. Here is some semi-fake code to illustrate what I mean

library(AnnotationHub)
hub <- AnnotationHub()
z <- query(hub, c("homo sapiens","ensdb"))
## Now let's assume that the most recent one is AH123456 (it's not - this is fake code after all)
ensdb <- hub[["AH123456"]]
sum(rownames(<TOIL DATA GOES HERE>) %in% keys(ensdb, "TXID"))/nrow(<TOIL DATA GOES HERE>)
## keep doing that with different versions of Ensembl until you get to a sufficiently high percentage
## of transcripts, where 'sufficiently high' is up to you
mapper <- select(ensdb, rownames(<TOIL DATA GOES HERE>), "GENEID", "TXID")
library(limma)
averaged_data <- avereps(<TOIL DATA GOES HERE>, mapper[,1])

Login before adding your answer.

Traffic: 854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6