- Read Tutorial
- Watch Guide Video
I want to talk about various strategies that you should have when it comes to building now your own machine learning and data science models.
And I'm going to look at one of my favorite case studies for this which is Google and specifically it's Google very early on in their development. Google was at the forefront of modern machine learning and I absolutely love the strategies that they took and so I thought it would be something that we could learn from.
And so in their book "in the Plax" Google talks about some of the ways they discovered how to perfect their search engine and you might think because Google's goal was to classify and make the entire world of knowledge completely searchable you would think that they would have these big huge types of queries and they would build these systems and you just think of them going very big very fast but they actually took the exact opposite approach.
They started noticing in their search indexes that they had some odd behavior. They would have a lot of their results and this is very early on. So in the early 2000s a lot of their results were being populated by spam types of posts. So in other words their case study that they kept looking at was for the word university the word university had an entire first page of results with just sales and spam. And there were no actual universities that were listed when you would type university.
And Larry Page and Sergey Brin thought that that was wrong. They thought that the system should be able to give the correct result that they thought it should give a university result. When you typed the word university into Google.
Now that was just one example of how spam was starting to populate the entire set of search results. It wasn't just for the word university but that was one specific word that the developers targeted.
So instead of thinking and stressing over trying to fix the entire world of spam Google took a very targeted approach. They simply cared about the word university and so they started altering the algorithms and building out their own machine learning models to get university to return the right results. And it took them a very long time to be able to make the changes they needed and the end result was the page rank algorithm that is still used today.
When I read that I thought that was fascinating because so many times myself personally and many other developers I know when they hear of some type of machine learning task they have to build out say a recommendation engine or something like that. They think of building a model that works for all use cases.
However what I've found is a much better strategy is first taking a very small targeted approach just like Google focused on the university keyword. Pick out one particular case study for your own machine learning algorithms and try to get that one example to work.
After that works then you can broaden and what you may discover is that that one case study that one example that you worked on is going to have a domino effect and the same types of approaches that you used for your example are going to work for a broader set of cases. And I know that that may sound like common sense and it is common sense and that's kind of the point. However, I know myself and many other developers I work with many times try to jump to implement the perfect solution too fast. Instead of trying to fix and build one type of machine learning approach we try to build the entire system and that can become stressful and it can also be very error-prone.
When you take in dissect one example and take very small steps when you're building out your system you'll see that you actually learn more about the data and about what you actually need to implement.
Going back to the example that we talked about in the machine learning guide where I talked about how I built out that recommendation engine for dev camp and creating recommended articles when I did that I took this approach I didn't care about the thousand plus articles that are on devcamp or even the tens of thousands that are in our third-party library. I didn't try to create a solution that worked for all of them right away.
Instead what I did is I picked out one article and it was a one article I talked about with the python and all of those different keywords like list and extend. And I told myself that once I could get the right article generated as the recommended article then that would tell me that I was on the right track. So I didn't overwhelm myself with trying to go through thousands of articles and analyzing all of that data.
Instead, I went I build out my I built out my prediction model for that single article and then I continued to work on the algorithm I combined a number of machine learning algorithms until it gave me the correct result. And so that approach worked very well for me and I was able to populate it throughout the rest of the application. After I got that one example.
So in summary the goal of this guide was to give you a mental framework for how you should approach building out machine learning algorithms and how you should approach building those types of implementations because many times if you simply look at the broad scope and you try to take in all of the different cases that need to be accounted for it can be pretty intimidating and that can lead to all kinds of issues.
It can lead to rushing and building the wrong kind of implementation. It can cause procrastination because the concept seems so big you don't even know where to start. But if you take very small targeted chunks and you make those case studies work then you're going to find that that leads to building out a more full set of solutions.