Continuing off from the previous edition in this series that takes you from the basics to the advanced features of machine learning (ML).
In this post # 2 of the series, we will examine what unsupervised learning is and what makes it such a significant branch of ML.
What is it?
Last time when we discussed supervised learning, we learnt that we basically give the computer an explicit problem and it solves it for us.
Basically, for the house cost predicting problem we had discussed, we can tell the machine “Hey, generate a predicting function to estimate the cost of a house with so-and-so features based on this real world data I’m providing you that has the prices and the features of houses sold on the market”.
That was supervised learning. In contrast, in unsupervised learning, we do not tell the computer to explicitly solve a problem — like the name suggests, we’re not at all “supervising” the computer in terms of making it solve a particular problem.
Instead, we are providing it with data and basically telling it “Okay, here’s the data. Now do something with it”.
When we ourselves don’t know if a specific trend exists in the data or whether or not a problem exists, we use unsupervised learning to let the computer figure out patterns or interesting correlations.
Let’s look at some other differences between supervised and unsupervised learning to get a better picture.
Unsupervised vs. Supervised
In addition to the overall goal of the algorithm, unsupervised learning also differs in the data that is provided to it. Unlike the labeled data set that was feed to the supervised algorithm, unlabeled data is provided to unsupervised learning algorithms: the algorithm does not know the output of each of the data points.
Second, unsupervised learning isn’t used to provide a fixed output for an input value. It won’t output “$200,214” for a particular house for example.
So what does it do?
As we discussed earlier, it tries to find patterns and trends in the data, and one of the important usages of this pattern-finding feature is that it can be used to ‘cluster’ data.
Consider the diagram below.
As you may have discerned yourself, you can arrange the above data into two different clusters of data.
This is one of the most widely used applications of unsupervised learning. Given a set of features for data points plotted on a graph, how do you determine which ones fall into a particular group?
This problem is addressed by the k-means clustering algorithm.
Basically, what this algorithm does is divide the given input into k clusters. Hence, once the algorithm is done running, each data point is assigned a label, or an output. So it takes unlabeled data and then returns labeled data.
In addition, it sets centers for each of the clusters. In the diagram above, you can see each data group has its own center. By placing centers where the data of one group is concentrated, it can then determine whether a surrounding data point belongs to a particular group by comparing the distances from that point to the k centers. Whichever center is closest to that point, that’s the cluster the point becomes a part of.
Of course, this is an iterative algorithm. This means that with each iteration (the running of the algorithm once), the coordinates of the centers may be changed, and depending on this, the points may change cluster.
Consider, the excellent animation below that demonstrates this as the number of iterations increases:
You can see that the number of points in the blue cluster starts off small, but then starts to increase dramatically, and as the final iteration steps are reached, the increase diminishes.
In this algorithm, the value for k is chosen before hand. What is typically done is a number of k values are chosen and the algorithm is run for each of these values, giving metrics (results) for each run.
There are numerous techniques to determine how to evaluate the metrics against each other, but that is a bit too detailed for now (though when we start to implement these algorithms in code, in a few posts, we’ll start to talk about them in a bit more detail).
If you’re really curious, here’s an interesting article.
Another use of the unsupervised learning is to detect anomalies in factories where test data for each product being manufactured is collected. This data is then provided to the unsupervised machine learning algorithm to determine what products had anomalies, or defects, in them.
This is important in industries like aerospace engineering where a reliable way of objectively testing the expensive products for defects is needed. If they ship a faulty multi-million dollar plane by mistake, it would be extremely disastrous for them!
Anomaly detection works a bit similar to supervised learning. However, the reason it falls into unsupervised learning is due to several reasons.
If you think about it, it can be thought of as a clustering problem where are there 2 clusters: “good” items that will be clustered together and “defective” or anomalous items that will be spread away from the “good” items.
There may be more clusters, but that gives you a picture of it (there may be “needs rechecking”, “sub par performance”, “definitely something wrong”, etc.).
However, anomaly detection systems are implemented in a different way, using Gaussian distribution curves. Moreover, there are other major differences but they may be overwhelming to you at this point. Just keep the above example in mind and that should clear things up for now.
Right now, in the beginning of the series, we are trying to build a wide, superficial knowledge of all of the major topics before we dive right in.
Of course, the infamous neural networks that you heard everywhere about.
Neural networks are widely used in supervised learning to solve a variety of problems. However, it is such a large and expansive topic that it necessitates an entire post of its own, and this post is already becoming quite big.
Cocktail Party Problem (Non-Clustering)
Unsupervised learning algorithms may also be able to solve problems that do not require clustering.
The cocktail party problem can be thought of as an audio file of a person speaking and music being played in the background.
What if you want to extract a new audio file of just the person speaking? Or what if the music happens to be really good and you wanna extract just the music and remove the person’s voice?
Unsupervised learning here can be used to separate the frequencies of the sound and create two different audio files, one for each sound. This can be applied to other situations; i.e., when there are two people speaking or even when there are more than 2.
Wow! We’ve finished the basics of unsupervised learning.
In the next post, we will discuss and evaluate the programming languages to use for implementing the neural networks, and then we will implement our first machine learning algorithm!