Question

Gene TPM validity with STAR only quantification

0

Entering edit mode

SciencyUsagi ▴ 10

@995a78f6

Last seen 23 months ago

United Kingdom

Hi all,

I performed reads alignment using STAR on Ensembl genome with the --quantMode geneCount. I re-organised ReadsPerGene.out.tab and extracted unstranded counts to create a count matrix. I used this count matrix for DEG analysis via DESeq2, but also wanted to generate TPM to input for ssGSEA analyses. To generate TPMs, I followed the formula:

t( t(counts.mat / gene.length) * 1e6 / colSums(counts.mat / gene.length) )

I estimated gene length via the Ensembldb::lengthof function, where:

"the length is the sum of the lengths of all exons of a transcript or a gene. In the latter case the exons are first reduced so that the length corresponds to the part of the genomic sequence covered by the exons."

ssGSEA results on these TPM was quite consistent with literature observations, but my question is whether the approach I took can be considered valid?

Thanks.

RNASeq • 1.1k views

ADD COMMENT • link written 2.1 years ago by SciencyUsagi ▴ 10

0

Entering edit mode

Are all exons expressed equally in your samples? I don't get how people generate TPM without knowing the proportion of transcripts for their samples.

ADD REPLY • link 2.1 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Highly doubt that. We ran the pipeline given to us by our bioinformatician, and clearly is wrong. Glad I doubted it.

ADD REPLY • link 2.1 years ago SciencyUsagi ▴ 10