Importing RSEM transcript-level data using tximport
1
1
Entering edit mode
@patrick-kimes-6796
Last seen 4.9 years ago
Boston, MA, USA

I'd like to read in transcript-level results from RSEM using the tximport package, but this does not appear to be supported.

The tximport vignette only describes importing sample.genes.results (gene-level) data, and the tximport::tximport function is hard-coded to set txIn = FALSE when type = "rsem" (see here). Also, in the same block of code, abundance results are read from the FPKM column of the RSEM output and not the TPM column. This appears to be inconsistent with the TPMs read in for Salmon and Kallisto.

Any thoughts on why these decisions were made? Changes to 1 allow transcript-level RSEM result, and 2 use TPMs instead of FPKMs, seem fairly quick.

tximport rsem • 4.2k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Patrick,

Initially, tximport was written for helping users summarize transcript-level measurements to gene-level, calculate the appropriate gene-level offset for average effective transcript length, and provide a uniform way to do this and arrange/name the matrices so that downstream packages could be run in a particular way (with benchmarking and Methods write up behind it), rather than in ad hoc manner (e.g. ignoring the bias corrected effective lengths and just using the counts). As RSEM does it's own summarization to gene-level (nearly the same as we do, minor differences for when a subset of samples have TPM=0 for the gene), I didn't code up defaults for the import of transcript-level measurements (type="RSEM" and txOut=TRUE), although now a few people have asked for this so I think I should when I find the time. First I need to put some example data in tximportData, so I could have some examples / tests for this.

Note: you can always import any kind of tables by manually specifying the arguments: geneIdCol, txIdCol, abundanceCol, countsCol, lengthCol.

Re: FPKM vs TPM, I don't know why my original code used the FPKM column. I agree it makes more sense to use TPM.

For both of these, I'll put it on my todo list to make the changes in devel, but first I'll need to generate the isoform level output files in tximportData so I can test against them. In the meantime you can use those arguments listed above.

ADD COMMENT
0
Entering edit mode

I should have read the docs closer - thanks for pointing me to the manual options! (Also, sorry - probably shouldn't have assumed that these changes would be "fairly quick". Thanks for the hard work.)

ADD REPLY
0
Entering edit mode

I just pushed new quantifications for RSEM, Salmon and kallisto to tximportData, so I'll have something to test on when I add transcript-level import for RSEM.

ADD REPLY
0
Entering edit mode

Added transcript-level import for RSEM in version 1.7.3:

https://github.com/mikelove/tximport/commit/f80fcaac7411ae590688c237088072a313772668

ADD REPLY

Login before adding your answer.

Traffic: 688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6