Hi, I have a methylation dataset and it's single cell. I wonder how can I get appropriate values from the edgeR glmfit object for downstream clustering. This is what I've run:
design0 <- model.matrix(~0 + sample + celltype, data=file.df)
design <- modelMatrixMeth(design0)
y <- estimateDisp(y, design=design, trend="none")
fit1 <- glmQLFit(y, design, robust=TRUE)
then I got celltype-specifc sites using glmQLFTest(fit1, contrast=contr)
for each celltype (the contr is like 1 vs others in the single-cell pseudobulk analysis from edgeR user guide).
I then extracted significant celltype-specific sites and their logFC values and drew heatmap - however the plot doesn't show a nice pattern. Seems like the values were not regressed out for celltype quite well, and for some statistically significant sparse counts they are not shrunk. Is there an alternative metric for visualization? thanks.
Thank you for your response! Yes clustering using M-values showed that the variations majorly come from samples and cell types.
To get the sites, I first used pseudobulk data to call all possible sites, then partitioned the counts into individual cell types - some clusters have many cells whereas others only have dozens of cells, so counts in many of the sites in small clusters are inevitably sparse. I ended up using
prior.count=4
and logFC values are indeed shrunk more than before. The thing is, I expected to see different modules that are highly active in particular celltypes, like distinct "blocks" popping out in the heatmap, but that's just not happening with my data. Guess it could just be how this dataset rolls..