Question

Obtaining transcript names in RSEM

0

Entering edit mode

deena ▴ 20

@deena-7415

Last seen 7.6 years ago

Germany

Hi,

I have performed the abundance estimation using RSEM which outputted the genes.results and isoforms.results. I would like to import the isoforms results into DESeq2 pipeline. I followed the tximport pipleine for RSEM like following but when I checked the rownames, it gave me gene names instead of transcript names

rsem.files=list.files(".","*.isoforms.result")

txi.rsem=tximport(rsem.files, type = "rsem",txOut = T)

Could anyone please guide what mistake I committed.

Thanks in advance.

tximport rsem • 2.6k views

ADD COMMENT • link updated 7.6 years ago by Michael Love 43k • written 7.6 years ago by deena ▴ 20

score 0 · Answer 1 · 2017-09-22

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

The tximport to DESeq2 pipeline described in the tximport vignette is designed for gene level analysis.

ADD COMMENT • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thank you very much for your reply. I am working on a non-model organism for which has denovo assembled transcriptome. I use Trinity suite to align and estimate the count data. Trinity suite can estimate counts using RSEM or kallisto depending upon the user's choice. In case of kallisto, it generates abundance.tsv.isoforms and abudance.tsv.genes and in case of RSEM genes.results and isoforms.results. The estimated/expected counts for the isoforms and gene files are almost same except for those transcripts which has isoforms.

So my point is can I make use of this isforms files into tximport pipeline and DESeq2 pipeline for further analysis ? Also after importing the kallisto or RSEM raw counts into DESeq2, is it advisable to rlog them and use for it plot generation?

ADD REPLY • link 7.6 years ago deena ▴ 20

1

Entering edit mode

You can do whatever you like with the quantifications. Variance stabilization is a good idea for calculating sample distances or ordination plots like PCA or MDS. You can read in the matrix from the isoforms table using base R functions for RSEM, and txOut with tximport.

ADD REPLY • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Hi

I tried importing the RSEM.isoform.results files into R like described in vignette. My RSEM.isoform.result file has columns transcript_id gene_id effective_length expected_count TPM FPKM IsoPct. So when tried to import it got the following code

list.files("~/Trinity_kallisto_RSEM/RSEM/","*.isoforms.results")

names(rsem_isoform)="t_0h_1"

tximport(rsem_isoform, type = "rsem",txOut = T)

reading in files with read_tsv
1 Error: all(c(geneIdCol, abundanceCol, lengthCol) %in% names(raw)) is not TRUE
When I went through the tximport code, it has written in a such way that for "rsem" option, the code recognizes the only column "gene_id". So when I deleted the column gene_id and renamed the transcript_id as gene_id it worked.

Am I doing the right thing? I tried tx.Out=T, then also I am getting the same error.

ADD REPLY • link 7.6 years ago deena ▴ 20

1

Entering edit mode

Instead of changing the column names, you should use the tximport arguments: geneIdCol, txIdCol, abundanceCol, countsCol, and lengthCol. If txOut=TRUE, then geneIdCol will be ignored so you can put anything.

We only are currently supporting RSEM's gene-level counts with type="RSEM". It would take more effort to support both, and I didn't have any time to write it so the function does this automatically. The user can always specify the above columns though such that it works. Note that it's simply cbind'ing the columns into matrices for txOut=TRUE, so there's not much to it.

ADD REPLY • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Hi, I tried using the suggested method by stating the column names..

txi.rsem_isoform <- tximport(files = files_isoform,tx2gene = tx2gene,type = "rsem", txOut = TRUE, geneIdCol = "gene_id", txIdCol = "transcript_id", countsCol = "expected_count", lengthCol = "effective_length", abundanceCol = "TPM")

Still, I also face the same problem as Deena. In fact, I observed that it doesn't even mater which column I name what. The result is always the same.

For example, the command below also resulted in the same results. Which is a bit strange.

txi.rsem_isoform <- tximport(files = files_isoform,tx2gene = tx2gene,type = "rsem", txOut = TRUE, geneIdCol = "expected_count",txIdCol = "transcript_id", countsCol = "gene_id",lengthCol = "effective_length", abundanceCol = "TPM")

ADD REPLY • link 6.9 years ago shreygandhi1990 ▴ 10

0

Entering edit mode

Can you upgrade to the latest version of Bioconductor and tximport? RSEM txp level support was just added to this release.

ADD REPLY • link 6.9 years ago Michael Love 43k