Entering edit mode
Hi,
I have performed the abundance estimation using RSEM which outputted the genes.results and isoforms.results. I would like to import the isoforms results into DESeq2 pipeline. I followed the tximport pipleine for RSEM like following but when I checked the rownames, it gave me gene names instead of transcript names
rsem.files=list.files(".","*.isoforms.result") txi.rsem=tximport(rsem.files, type = "rsem",txOut = T)
Could anyone please guide what mistake I committed.
Thanks in advance.
Hi Michael,
Thank you very much for your reply. I am working on a non-model organism for which has denovo assembled transcriptome. I use Trinity suite to align and estimate the count data. Trinity suite can estimate counts using RSEM or kallisto depending upon the user's choice. In case of kallisto, it generates abundance.tsv.isoforms and abudance.tsv.genes and in case of RSEM genes.results and isoforms.results. The estimated/expected counts for the isoforms and gene files are almost same except for those transcripts which has isoforms.
So my point is can I make use of this isforms files into tximport pipeline and DESeq2 pipeline for further analysis ? Also after importing the kallisto or RSEM raw counts into DESeq2, is it advisable to rlog them and use for it plot generation?
You can do whatever you like with the quantifications. Variance stabilization is a good idea for calculating sample distances or ordination plots like PCA or MDS. You can read in the matrix from the isoforms table using base R functions for RSEM, and txOut with tximport.
Hi
list.files("~/Trinity_kallisto_RSEM/RSEM/","*.isoforms.results")
names(rsem_isoform)="t_0h_1"
tximport(rsem_isoform, type = "rsem",txOut = T)
reading in files with read_tsv
1 Error: all(c(geneIdCol, abundanceCol, lengthCol) %in% names(raw)) is not TRUE
When I went through the tximport code, it has written in a such way that for "rsem" option, the code recognizes the only column "gene_id". So when I deleted the column gene_id and renamed the transcript_id as gene_id it worked.
Am I doing the right thing? I tried tx.Out=T, then also I am getting the same error.
Instead of changing the column names, you should use the tximport arguments: geneIdCol, txIdCol, abundanceCol, countsCol, and lengthCol. If txOut=TRUE, then geneIdCol will be ignored so you can put anything.
We only are currently supporting RSEM's gene-level counts with type="RSEM". It would take more effort to support both, and I didn't have any time to write it so the function does this automatically. The user can always specify the above columns though such that it works. Note that it's simply cbind'ing the columns into matrices for txOut=TRUE, so there's not much to it.
Hi, I tried using the suggested method by stating the column names..
txi.rsem_isoform <- tximport(files = files_isoform,tx2gene = tx2gene,type = "rsem", txOut = TRUE, geneIdCol = "gene_id", txIdCol = "transcript_id", countsCol = "expected_count", lengthCol = "effective_length", abundanceCol = "TPM")
Still, I also face the same problem as Deena. In fact, I observed that it doesn't even mater which column I name what. The result is always the same.
For example, the command below also resulted in the same results. Which is a bit strange.
txi.rsem_isoform <- tximport(files = files_isoform,tx2gene = tx2gene,type = "rsem", txOut = TRUE, geneIdCol = "expected_count",txIdCol = "transcript_id", countsCol = "gene_id",lengthCol = "effective_length", abundanceCol = "TPM")
Can you upgrade to the latest version of Bioconductor and tximport? RSEM txp level support was just added to this release.