Question

Importing RSEM transcript-level data using tximport

1

Entering edit mode

Patrick Kimes ▴ 10

@patrick-kimes-6796

Last seen 5.0 years ago

Boston, MA, USA

I'd like to read in transcript-level results from RSEM using the tximport package, but this does not appear to be supported.

The tximport vignette only describes importing sample.genes.results (gene-level) data, and the tximport::tximport function is hard-coded to set txIn = FALSE when type = "rsem" (see here). Also, in the same block of code, abundance results are read from the FPKM column of the RSEM output and not the TPM column. This appears to be inconsistent with the TPMs read in for Salmon and Kallisto.

Any thoughts on why these decisions were made? Changes to 1 allow transcript-level RSEM result, and 2 use TPMs instead of FPKMs, seem fairly quick.

tximport rsem • 4.4k views

ADD COMMENT • link updated 7.3 years ago by Michael Love 43k • written 7.3 years ago by Patrick Kimes ▴ 10

score 2 · Accepted Answer · 2017-11-29

hi Patrick,

Initially, tximport was written for helping users summarize transcript-level measurements to gene-level, calculate the appropriate gene-level offset for average effective transcript length, and provide a uniform way to do this and arrange/name the matrices so that downstream packages could be run in a particular way (with benchmarking and Methods write up behind it), rather than in ad hoc manner (e.g. ignoring the bias corrected effective lengths and just using the counts). As RSEM does it's own summarization to gene-level (nearly the same as we do, minor differences for when a subset of samples have TPM=0 for the gene), I didn't code up defaults for the import of transcript-level measurements (type="RSEM" and txOut=TRUE), although now a few people have asked for this so I think I should when I find the time. First I need to put some example data in tximportData, so I could have some examples / tests for this.

Note: you can always import any kind of tables by manually specifying the arguments: geneIdCol, txIdCol, abundanceCol, countsCol, lengthCol.

Re: FPKM vs TPM, I don't know why my original code used the FPKM column. I agree it makes more sense to use TPM.

For both of these, I'll put it on my todo list to make the changes in devel, but first I'll need to generate the isoform level output files in tximportData so I can test against them. In the meantime you can use those arguments listed above.