Question

Convert Normalized RNA-Seq counts to TPM after Batch Correction

0

Entering edit mode

rrcutler ▴ 70

@rrcutler-10667

Last seen 5.1 years ago

Hello all,

I have performed batch correction using SVA using normalized counts generated from DESeq2. I then used a function in Limma in order to adjust log normalized counts in so that I can output these batch corrected counts to do analysis that does not involve differential expression. Now I have batch corrected counts that are normalized among my samples. However, I am interested in comparing counts between genes and thus need to convert this to TPM.

Code:

# getting normalized counts
dat  <- counts(dds, normalized = TRUE)

# filtering out normalized counts less than 1

idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]

# since we only have an adjustment variable, batch, we use this as our variable of interest
mod  <- model.matrix(~ batch, sTable)
# setting the null model to the intercept
mod0 <- model.matrix(~ 1, sTable)
# running sva
svseq <- svaseq(dat, mod, mod0)

# get the sva covariates
sva_covar <- svseq$sv

# Adjust normalized counts
counts_deseq_sva <- removeBatchEffect(log(counts(dds, normalized = TRUE)+1), covariates = sva_covar)

I have 3 questions:

Does it make sense to be using removeBatchEffects() as I have? I happen to know the batch covariates, so it may be easier to use the "batch" parameter in the removeBatchEffects()... however, that does not fix my problem of outputting normalized counts.
Since DESeq2 has already normalized for read depth between samples, if I just divide each read count by it's respective gene length, would this be a similar measure to RPKM?
Is there a better way of getting batch corrected TPM counts?

Many thanks,

-R

sva deseq2 limma • 7.3k views

ADD COMMENT • link updated 2.6 years ago by eve.olszanowski • 0 • written 7.0 years ago by rrcutler ▴ 70

score 3 · Accepted Answer · 2018-04-09

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

To answer #3, if you want batch corrected TPM, I think I would just stay on the abundance scale the whole time (estimating mean shifts and removing them on the log2 scale, and working with log2 TPM for plotting or whatever else you need). Note that I would not force the abundances to sum to 1 million at the end, because this is not a great normalization for practical purposes (see e.g. Bullard 2010, or Robinson 2010, or others).

As far I use and am familiar with SVA, you shouldn't provide the batch to `mod`. The whole point is to get SVA to estimate numerical variables that capture the shifts seen across batches, so either you have the batches and you just remove mean shifts associated with that categorical variable, or you provide SVA with a biological variable and it finds low rank structure that is orthogonal to the biological variable.

ADD COMMENT • link 7.0 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thanks for you reply, I'll try your first suggestion. Could you expand on why TPM is not a great normalization?

I am providing the batch to `mod` as I just have a single group of the same condition where I wish to remove the batch in between them. Is there a better way of using SVA for this purpose?

Thanks,

-R

ADD REPLY • link 7.0 years ago rrcutler ▴ 70

0

Entering edit mode

I’ve provided links to literate covering why, for comparing across samples, it’s suboptimal to require each sample to have the same sum.

I’ll leave recommended SVA usage with respect to known batches to the package authors.

ADD REPLY • link 7.0 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

I think you forgot the links you were mentioning.

-R

ADD REPLY • link 7.0 years ago rrcutler ▴ 70

1

Entering edit mode

Oh I meant “links” like pointers, google the citations “Bullard 2010” and “Robinson 2010”

ADD REPLY • link 7.0 years ago Michael Love 43k

0

Entering edit mode

Hi Michael. If I already have the data in TPMs, is it correct to apply limma::removeBatchEffect on this table? I want to plot in a same scale several genes related to a GO pathway, so that is why I am aiming to use TPMs. But I am not sure if this is the correct approach.

Thanks in advance!

ADD REPLY • link 2.6 years ago eve.olszanowski • 0

0

Entering edit mode

Sure, but I would remove batch effects on log2 scale. You can convert back to TPM after.