- Read Tutorial
- Watch Guide Video
The main reason why I wanted to cover it in this section is because you're going to hear many different developers and you're going to read books and documentation where they'll reference the term Gaussian quite a bit and I think it helps to hear it if you've never heard it before.
But really what it represents is a bell-shaped curve. And you may also have heard it called the normal distribution. So whenever you see a curve on a graph that looks something like this
where it is fully normally distributed where it has a slope that goes all the way up to the top and then it comes down. This is a Guassian distribution actually one of the most fundamental concepts in the world of statistics and mathematics because it appears in so many different places in the world.
And so when you see this what you can think of is on the left-hand side this is a certain percentage on the left-hand side of each one of these curves it can be a certain percentage of your data. And then on the right-hand side of the curve is another percentage and depending on the slope of the Gaussian curve then it's going to determine if it's like this one that goes all the way up top almost to 1.0 or if it goes just somewhere in the middle.
And in order to build a Guassian distribution, you are going to have different tools like the mean and the standard deviation. And that's where you're going to be able to use to build that kind of curve you're going to pipe those into a function and then it's going to generate these curves for you and that's what's going to determine the shape.
Now I want to give a real-world example of this and it's not even really related to machine learning as much as it is just normal statistics. I was talking with one of the VPs of Marketing at Pepsi and he said that they use the Gaussian distribution when they are preparing some new marketing campaign. And because Pepsi has been around for so long they have some pretty accurate results on what the feedback of their marketing campaigns is going to be.
And he said it was interesting that they've seen that they have somewhere around 5 percent of the people that are going to see their marketing campaigns as those 5 percent are going to be people who absolutely love them. It doesn't really matter what the marketing campaign is. They simply love Pepsi and because of that, they're going to love the marketing campaign.
Then on the other side of it he said he always sees around 5 percent of people who hate them no matter what. They could be giving out free winning lottery tickets and he's still going to have 5 percent of people who despise the marketing campaign just because they do not like Pepsi.
So what he said is that his goal is to completely ignore the 5 percent to the left and the 5 percent to the right. And what he focuses on are the 90 percent in the middle and that fits perfectly in with this Guassian and distribution where you can see the majority of the people are going to fit inside the middle of that curve and you're going to have quite a decrease on both the left and the right-hand side.
But it's going to be even just like he said he saw in their marketing programs and those are programs that have a large amount of historical data. So those have millions upon tens of millions of responses off social media and different mechanisms like that so it's a great case study for being able to analyze historical data and also to see that that fits perfectly in with this type of Gaussian distribution.