Question

most correlated genes in SingleR

0

Entering edit mode

lirongrossmann ▴ 50

@lirongrossmann-23954

Last seen 4.0 years ago

Hi,

Is there a way in SingleR to know for each cell in the test data- which genes (in the test cell) are mostly correlated with the genes in the predicted cell from the reference dataset? For example, for a cell X in test data that is predicted to be cell Y in the reference data - what are the highly correlated genes in X with Y?

Thanks! Liron

singlecell singleR • 1.8k views

ADD COMMENT • link updated 4.4 years ago by Aaron Lun ★ 28k • written 4.5 years ago by lirongrossmann ▴ 50

score 0 · Answer 1 · 2020-08-25

Given a single cell, it is not possible to determine the "most highly correlated gene". For one cell X, one reference Y and one gene, we only have a pair of observations; there's nothing to compute a correlation on. In fact, SingleR doesn't even use the concept of per-gene correlation when we're talking about populations of cells.

I suspect you are instead asking "which genes are driving the correlation between X and Y?" SingleR doesn't have a formal way of breaking down the statistics in this manner. The book describes some diagnostic plots based on marker gene expression, which may be sufficient. You could also look at which marker genes for Y are most highly expressed in X, which should be a good heuristic for identifying the contributing genes.

(For completeness, I would answer the above question by removing one marker gene at a time, repeating the assignment and examining the difference in the scores for the initial label Y. Large drops in the score indicate that the gene is very important for that cell's assignment to Y. However, there would be a lot of genes to go through, which is a pain; the qualitative diagnostics are fast and probably good enough for most purposes.)