Dunn Index for Clusters from scRNA-Seq
1
0
Entering edit mode
@hamza_karakurt-17704
Last seen 2.3 years ago
Turkey

Hello everyone, I want to try Dunn Index to validate my clustering results from scRNA-Seq data. I know the Dunn index starts from 0 and goes to infinity and higher results mean better clustering. I tried this method from scRNA-Seq which clustered with buildSNNGraph() function of Scater/Scran package and the graph is clustered with Louvain algorithm of igraph package. I tried range of k values and want to score them. Most of the Dunn indexes are between 0.08 and 0.1. Can I use these values to compare my clustering results or Dunn index is working for methods of distance based clustering rather than graph based clustering methods?

I know modularity function can be used in that cases but I saw that modularity decreases with increasing of k in buildSNNGraph and buildKNNGraph functions so I wanted to use a different method.

Thank you in advance

scRNA-Seq clustering scater scran dunn index • 1.7k views
ADD COMMENT
0
Entering edit mode

Cross-posted on Biostars: https://www.biostars.org/p/368557/

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 15 hours ago
The city by the bay

To answer your immediate question: I don't see an inherent problem with using the Dunn index to assess separation of clusters, provided you're willing to do all those distance calculations. But keep in mind that the clustering methods in igraph will attempt to maximize the modularity, not the Dunn index. If a graph-based clustering strategy gives you a higher modularity but a lower Dunn index, you can hardly say that it performs poorly - it's just doing its job.

You also don't mention what flavor of Dunn index is being used. If you're using the one that involves computing the ratio of the minimum inter-cluster distance to the maximum intra-cluster distance, I'd say that this is far too conservative to be useful in single-cell data. A single misassigned cell is enough to make your index very small, even if the rest of the clustering is fine.

I want to try Dunn Index to validate my clustering results from scRNA-Seq data.

Don't use the word "validation". Validation implies that there is some kind of truth to be found, but this isn't really the purpose of clustering, as we have already discussed. Currently, all that you're doing is to evaluate the separation between clusters, which is fine and useful but is a long way from establishing truth. If you want to "validate" something, you should be performing functional experiments to demonstrate that your clusters correspond to cells that have different biological behaviour.

I tried range of k values and want to score them.

Or you could just pick one and see if it's useful. Clustering doesn't have to be perfect, it just has to be good enough for downstream interpretation.

I know modularity function can be used in that cases but I saw that modularity decreases with increasing of k in buildSNNGraph and buildKNNGraph functions so I wanted to use a different method.

This is a natural consequence of increasing the number of connections in the graph. I would say that this is a feature rather than a bug, because increasing the connectivity allows us to obtain more granular clusters. In this manner, we can adjust the resolution as desired if there are too few/many clusters for further examination.

ADD COMMENT

Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6