# Finding an Optimal Number of Clusters With Elbow Method

**Introduction**

Different clustering algorithms like **K-Means** or **Hierarchical**, require the number of clusters at first but finding out the best number for clustering is a bit headache. You can run your algorithm to cluster the points you have, but you will have to guess how many clusters you want to get or how many clusters there are separable into. It's easy when you have **2D points**. You can visualize them and find out how many clusters you can take. Here what you will see.

By looking at this image, you can assume **3 is the optimal number of clusters**, **but what if your points are not from 2D?** How can you find out which number is a good choice for clustering? It turns out there is a way for finding out an optimal number for those clustering algorithms, who require the number of clusters at first. So what is the way?

**Clustering Metrics**

Before describing the method for finding out the optimal number of clusters, let's define some metrics, which we are going to be using in the algorithm.

Let's assume we've already run a clustering algorithm with a certain number of clusters and we know the exact cluster for all points. We can define some metric functions for them.

**Cost Function**

First, we need to find the centers of every cluster, which we call centroids, after finding them, we need to compute the distance between the centroids and all points in that cluster and take the mean of them.

The cost function depends on the points and the centroids of every cluster. We need to minimize this function, cause the minimum mean distance between points and their centroids is the situation we are looking for.

**Radius Function**

As we've already defined centroids for every cluster, we can use them to define another metric, which is called a **radius**. Instead of taking the mean distance between the points and their centroids, we can take the maximum distance between them, inside every cluster. Like we are trying to find the point, which has far away from the centroids more than other points inside of the same cluster.

**Diameter Function**

It's almost the same function as radius, but it computes maximum distance not between the points and their centroids, or between all points inside of the cluster. In other words, it tries to find 2 points of the same cluster that have the maximum distance and takes that distance.

**Variation Function**

Instead of taking the maximum distance between the points or between the points and their centroids, we can take the variation of each cluster. If you take a look at the cost function, it is almost the variation function, we just need to take the same cost function and put it in the **square root**.

As we defined a couple of metric functions, it's clear that we need to minimize all these metrics during our clustering, in other words, **finding the best choice for the number of clusters is the minimum point for all of these metrics**.

Now let's take a look at the Elbow method, which finds the optimal number of clusters using one of these metric functions, or any combination of them.

**Elbow Method**

This method is simple. You just need to run your clustering algorithm multiple times with a different number of clusters and use any metric function we described above. Here how it looks like

**First, you need to choose your metric function****Then run the clustering algorithm for k = {1, 2, 3, ... n}****Collect and plot all values of the metric function for every k**

Now let's see the curve

**The best value for the number of clusters is the place, where the curve breaks.**

It's the value that is circled with a red color. 3 is the optimal choice for the clustering algorithm. You can use this method not only for K-Means or Hierarchical clustering algorithms, but you can also use this for others, who require the number of clusters at first.

One more thing, the Elbow curve is going down as you increase the number of clusters. If you remember how we described metric functions, it will be clear that by increasing the number of clusters, we will decrease the Elbow curve.

For instance, you can take the number of clusters N, which is the number of points we've got, and all metrics will be 0.