- Read Tutorial
- Watch Guide Video
Now there is a pretty key difference between unsupervised and supervised learning algorithms. I'm going to walk through a number of examples in this section that will hopefully help clarify those now at a high level what the differences are is that when you're asking yourself are questions and if you remember back to what our questions are supposed to be when you're working with machine learning algorithms you know that with classification you're asking what some thing is such as when you asked if a particular e-mail was spam or if it wasn't spam or you're working with regression and you're asking what should something be like when we asked if a player's salary or we ask what a player's salary should be based on their performance compared with the rest of the league.
In unsupervised learning we are looking for a very different type of goal. We don't have that end a definitive answer in mind. Instead what we're looking is to answer the question "what group does this belong to?" And so let's build out a little example of this. So I'm going to build a small graph right here and let's try to answer the question of what makes a great student at Devcamp. We have a learning management system that has a number of different machine learning algorithms built into it.
One of the key components we look for is student assessment and performance so we try to see which students are doing very well and which students are struggling not because we want to judge them one way or the other but because we want to give the student that is struggling more help so that they can succeed and the students that are that are just doing very very exceptionally then we want to give them more content so they can go even faster.
So being able to use this type of algorithm an unsupervised algorithm which is also referred to as a clustering algorithm that gives us that kind of power in order to have that knowledge to look and say this student deserves to get this extra assistance or this student needs to be challenged more. And so that's a very important component when it comes to deciding if you want supervised or unsupervised is if you're trying to figure out a grouping then that means you may be looking for an unsupervised learning algorithm.
Let's take a look an example on the board right here. So imagine that we have our learning management system and each one of the dots I'm going to draw represents a student. So we may have a student here, a student here, a student here, and a student here and these all belong inside of this specific cluster.
Now, these let's imagine these are students that are performing very well. They're logging into the system every day, they're seeing their quizzes, you can tell not based on any information that we're piping in we're not saying this student is better than another one that's kind of the entire point when it comes to unsupervised type of algorithms. You don't really have a lot of say as a developer into what the system is going to output. You simply have a way of grouping items. So these are students when we look at them we can tell they're doing very well. Because you can look at the data you can say they're getting all A's and they're just killing it.
Now we have this other set of students over here
These students are struggling a little bit, they are not logging into the system regularly, their grades aren't as good when they do log in, they're not turning in their projects as well. When a new student comes in we want to analyze to see are they in group A or are they in group B? So if we have this other student that comes in and they are a little triangle.
And when they come in we are going to run the algorithm is going to run and go go through all of their data and it is going to see what cluster does it belong to based off of all of the data that's getting piped in and if it belongs right here which means in this graph it's at this position.
All of this is based on the variables that we give to it then this cluster is going to grow a little bit and it is going to include this new user.
Now this is the type of algorithm that is used in all kinds of different industries. Imagine a scenario where you're working for Facebook and you're trying to decide the estimated interests of new users that are coming onto the system.
You want to see all of the different groups they're in and see and track what kinds of things they may be interested in because that's how you're going to give them their recommendations. And so there are a number of great case studies that we're going to be able to go through and algorithms that we're going to be able to dissect. That's going to tell us how we can build these types of clusters.