There are a number of aspects of your post that need addressing, so let's do it one at a time.
The first is the switch from SCESet
to SingleCellExperiment
. This happened a while ago, motivated by the superiority of the SummarizedExperiment
class as a general data container in terms of stability, flexibility and usability. From a user perspective, this simply involves changing the constructor call (from newSCESet()
to SingleCellExperiment()
), and the various accessors (e.g., fData()
to rowData()
, pData()
to colData()
). Not particularly hard, and it also allows you to interface with any SummarizedExperiment
-compatible packages, e.g., iSEE, DESeq2.
As for TMM normalization - we've known for a while that this was a poor choice of normalization method for single-cell RNA-seq data with lots of zeroes, see https://doi.org/10.1186/s13059-016-0947-7 for a study of this. (Similar criticisms apply to DESeq's default normalization.) Thus, we no longer recommend using TMM normalization and have removed all functions that do so. I would suggest using alternatives like scran:::computeSumFactors()
, see the simpleSingleCell workflow to see how it's done. That said, if you insist on using TMM, you can simply call edgeR::calcNormFactors
directly on your count matrix and multiply the result by the library sizes to get the "TMM size factors". The multiplication is important as calcNormFactors
alone will only yield the normalization factors, these need to be scaled by the library sizes to obtain the size factors (yes, there's a difference between these two terms!).
The situation of normalizeExprs
is a bit more complicated because it tries to do three things at once - TMM normalization, log-transformation and batch correction. I didn't write this function, but I hated it. It doesn't have a single purpose, it's just cobbled together from three separate functions that might as well be called separately. Separate calls would require a bit more writing, but at least the user (and reader of the code) understands what is happening. A reader seeing a call to normalizeExprs()
would find it hard to figure out the function does. If we had to use a single function, it should instead be called:
calcTMMFactorsAndNormalizeAndRemoveBatchEffects
... which we can all agree is a stupid name. I deprecated normalizeExprs()
because it was better for users to be explicit about what they wanted to do and call the relevant functions directly.
Hey Aaron,
Thank you for your answer. I will use calcNormFactors in scater. After this line, I need to multiply my SumFactors with my counts to normalize right?
Or after computeSumFactors, directly normalize(sce) command does not do the job? I am working on unique barcoded single cell RNA-Seq.
Thanks again.
For your first question: get your terminology right, otherwise this discussion will be very confusing.
calcNormFactors
is from edgeR. It returns TMM normalization factors, one per cell. This needs to be multiplied by the library size for each cell to obtain the size factor. You can then save the size factors into theSingleCellExperiment
object withsizeFactors(sce) <- tmm.size.factors
, and runnormalize
to compute log-transformed normalized expression values.For your second question, I'm not sure what you're actually asking. Running
computeSumFactors
will compute the size factors and store them in theSingleCellExperiment
object (assuming that the input was also an SCE object). Runningnormalize
will then compute log-transformed normalized expression values.Hi Aaron,
I have a similar situation where I am trying to reproduce earlier results for which normalizeExpr command was used.
Previous code:
Commands using edgeR TMM method: (Approach 1)
I've also tried using normalise command: (Approach 2)
I get same expression values using approach 1 and 2, and not match with the expression values using "normaliseExprs" command.
Can you please suggest if I missed anything.
Thanks
Sharvari
The
computeSumFactors
call in your previous code does nothing, asnormalizeExprs
will simply overwrite any computed size factors with TMM-derived size factors.Approach 1 should be identical to what
normalizeExprs
used to do in BioC 3.7. (I assume you're comparing the"logcounts"
output across the different approaches.) If this is yielding different output to your previous code, then I don't know why. I would suggest you double-check your inputs.In any case, it doesn't matter because approach 2 is the correct thing to do anyway. I don't see how you could possibly get the same results from approaches 1 and 2, they should not yield the same results in real data.