Hello all,
I have performed batch correction using SVA using normalized counts generated from DESeq2. I then used a function in Limma in order to adjust log normalized counts in so that I can output these batch corrected counts to do analysis that does not involve differential expression. Now I have batch corrected counts that are normalized among my samples. However, I am interested in comparing counts between genes and thus need to convert this to TPM.
Code:
# getting normalized counts dat <- counts(dds, normalized = TRUE) # filtering out normalized counts less than 1 idx <- rowMeans(dat) > 1 dat <- dat[idx, ] # since we only have an adjustment variable, batch, we use this as our variable of interest mod <- model.matrix(~ batch, sTable) # setting the null model to the intercept mod0 <- model.matrix(~ 1, sTable) # running sva svseq <- svaseq(dat, mod, mod0) # get the sva covariates sva_covar <- svseq$sv # Adjust normalized counts counts_deseq_sva <- removeBatchEffect(log(counts(dds, normalized = TRUE)+1), covariates = sva_covar)
I have 3 questions:
- Does it make sense to be using removeBatchEffects() as I have? I happen to know the batch covariates, so it may be easier to use the "batch" parameter in the removeBatchEffects()... however, that does not fix my problem of outputting normalized counts.
- Since DESeq2 has already normalized for read depth between samples, if I just divide each read count by it's respective gene length, would this be a similar measure to RPKM?
- Is there a better way of getting batch corrected TPM counts?
Many thanks,
-R
Hi Michael,
Thanks for you reply, I'll try your first suggestion. Could you expand on why TPM is not a great normalization?
I am providing the batch to `mod` as I just have a single group of the same condition where I wish to remove the batch in between them. Is there a better way of using SVA for this purpose?
Thanks,
-R
I’ve provided links to literate covering why, for comparing across samples, it’s suboptimal to require each sample to have the same sum.
I’ll leave recommended SVA usage with respect to known batches to the package authors.
Hi Michael,
I think you forgot the links you were mentioning.
-R
Oh I meant “links” like pointers, google the citations “Bullard 2010” and “Robinson 2010”
Hi Michael. If I already have the data in TPMs, is it correct to apply limma::removeBatchEffect on this table? I want to plot in a same scale several genes related to a GO pathway, so that is why I am aiming to use TPMs. But I am not sure if this is the correct approach.
Thanks in advance!
Sure, but I would remove batch effects on log2 scale. You can convert back to TPM after.
OK thanks!