Hello,
I've been using MAST for analysis of single-cell qPCR data, and I'm familiar with its use for "traditional" single-cell RNAseq data where reads from the full lengths of transcripts are converted to digital gene expression (via counts). I was wondering if anyone had considered any potential issues with using MAST for analysis of single-cell data from platforms like 10X DropSeq, where counts are estimated using UMIs but only from either the 3' or 5' end of transcripts (and never with any data from elsewhere in a transcript). From DropSeq approaches you can get a raw UMI count, and they recommend first filtering unexpressed genes, then normalizing the gene-specific UMI counts by the median number of UMIs obtained from each cell, and taking the log-transformation of the gene/cell matrix (this all seems very similar to what we would do with RSEM or EdgeR).
From my perspective I can't see any obvious issue here, but I wanted to know if anyone else had any thoughts on whether this sort of data might for some reason (perhaps related to the UMI approach, the 5'/3' specific sequencing, or this particular normalization approach) violate assumptions underlying the MAST framework.
Thanks for reading!
Hello Andrew,
As a follow-up question, it is technically okay to apply MAST on log2(CPM+1) data right? How do I determine which normalization method to use in general?
Thanks!
Technically the issue is the quality of the normality assumption in the continuous portion of the model. In my experience the non-zero component of the log2(1+CPM) has appeared pretty symmetric for droplet technologies, but you could evaluate this yourself informally graphically or formally with tests for symmetry. As the number of cells considered increases (typical with droplet technologies) the importance of the normality decreases because of the central limit theorem. In independent evaluations, MAST has been shown to maintain it's advertised level in a range of scenarios, for instance Soneson and Robinson 2018 (https://www.nature.com/articles/nmeth.4612/).