## Uncharted Waters

Jan 26 2017   3:02PM GMT

# Machine Learning in plain English

Profile: Matt Heusser

Tags:
Artificial intelligence
Machine learning

If you’ve been listening to the internet lately, you’ve probably heard about Machine Learning and Artificial Intelligence. The conversations with me, have gone a bit like this:

Other Person: (entire category of problem or job) Is going to go away with the rise of Machine Learning.

Matt: That’s interesting. What algorithm would you use to solve the problem? How would you encode it?

Other Person: That’s the beauty part. The Artificial Intelligence teaches itself!

Wait, what?

On a bad day, AI can seem a like magic, think of thing that fills in the question mark in the classic SouthPark skit – “Step 1 – steal underpants. Step 2 – ?. Step 3 – Profit!”

Of course, there is no magic. AI and Machine learning can just do the same thing a human will do, only much faster, many more times, over a much larger dataset.

Let’s talk about how that works, starting with a machine learning algorithm you can perform by hand.

## Newton’s Method

A bit like the child’s game of “hot and cold”, Newton’s Method starts with a guess, plugs it into a function, ten adjusts that number toward the correct answer. If you’ve got five minutes and want to see it worked by hand, you can watch it in action here:

Obvious, a computer could solve this much more quickly.

That basic method, of “guess, check and adjust” is the guts of machine learning. The classic example of this is the poker hands program, which trains software to recognize pokers hands. You can think of it like Newton’s method: The software makes a guess, checks to see if that guess is correct, then uses that information to improve the next guess. Repeat for a data set with twenty-four thousand samples, each including the correct rank of the hand, and the computer can predict with reasonable accuracy.

The actual program to grade the hands is under one hundred lines of code; you can read it yourself.

## Lessons From a Poker Grader

If you look at the code, it doesn’t actually explain how to do the training. Instead, the program reads in the file from disk, parses it into the proper arrays, then calls the clf.fit() function, where the real magic happens. The fit() function is where the program is trained.  Once the program is trained, the programmer can pass in a hand to the function, call clf.predict(), and get a prediction. From there, the programmer runs through another file, comparing another set of sample hand predictions to the correct answer, to see if the program is smart enough. If the ratio of guesses is too low, you can always find or create more sample data. There are only 311 million possible poker hands.

Most applications of machine learning having an infinite possibility set. For example, you might look at a list of people with symptoms for a specific kind of cancer. Half the group has symptoms and cancer, the other half has some symptoms but not. Finding what symptom or two is most closely correlated to cancer is easy. The challenge is looking at sets of symptoms, and trying to predict if a given set of symptoms is likely to have cancer or not. Our data could even include ten people with the same symptoms, nine of which had cancer, and one did not.

This sort of analysis is guessing, but it can be more accurate, faster, and cover a larger data set than a human can by looking, or what appeared in a lookup table on your doctor’s desk reference twenty years ago.

Those examples are what is called supervised machine learning. Given a combination and an outcome, get the computer to guess an answer and compare.

With Unsupervised machine learning, we give no pre-formed pattern. There is no cancer to find, no poker hands to grade. Instead, the computer finds the patterns and clumps them into groups for us to name and analyze. Where a supervised  machine learning algorithm at Facebook might separate pictures-with-faces-to-be-tagged from those without, an unsupervised algorithm would come back with clumps, which humans might translate as “animals”, “landscapes”, “people”, “modern art”, “finger-over-lens”, and so on.

## Lessons For Software

If people are talking about analyzing large data sets for insights — monitoring tools, databases, project management data, and so on, they might be on to something. Vague ideas of eliminating jobs because “Machine Learning”, not so much. There might be a niche space for machine learning in academic projects, medical software, and other very low-level software, but most people aren’t working in that space.

For a little more on machine language, check out Four Factors for Machine Learning, that I co-wrote with Peter Varhol, or Constructing Effective Neural Networks. ML is just a small piece of AI, which we might have to cover soon.

## 1 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
• or you can use the next one.