From the course: Artificial Intelligence Foundations: Thinking Machines

Perceptrons

- You might be wondering why machine learning took so long to catch on. After all, it was in 1959 when Arthur Samuel created his revolutionary checkers program. At the time it seemed like machine learning had the wind at it's back. It was ready to become the dominant form of artificial intelligence. Yet what actually happened is that machine learning took a backseat to other innovations. Such as symbolic systems. It wasn't until the late 1980's and early 1990's when researchers started thinking again about machine learning. The rise and fall, and rise again of machine learning is both sad and interesting. It shows how just a few researchers were instrumental in building out early AI. Back in 1958 a Cornell professor named Frank Rosenblatt created an early version of an artificial neural network. Except instead of using nodes and neurons he used the term Perceptrons for his network. He argued that if you tied together enough of these Perceptrons, you could create a complex form of machine intelligence. Rosenblatt thought that these Perceptrons were the most promising path to AI. He built a machine called the Mark 1 Perceptron. It tied together thousands of these Perceptrons into an early neural network. It had small cameras and was designed to learn how to tell the difference between two images. Unfortunately it took thousands of tries and even then the Mark 1 still had a hard time distinguishing even basic images. While Rosenblatt was working on his Mark 1, an MIT professor named Marvin Minsky was pushing hard for the symbolic systems approach. In 1969, Minsky co-authored a book with Seymour Papert, called "Perceptrons". In it, he argued decisively against Rosenblatt's approach to artificial neural networks. A few years after the book was published, Rosenblatt died in a boating accident. Without Rosenblatt to defend Perceptrons, much of the funding for his approach dried up. Minsky later dedicated the work to his one time rival, but it was too late. Perceptrons and artificial neural networks languished for nearly a decade. One of the biggest challenges with Perceptrons is that Rosenblatt's Mark 1 didn't include a hidden layer. This is a key part of how artificial neural networks learn and adapt. This was partly because the Mark 1 was a hardware device. So the single layer was hard coded to classify only a few patterns. In the mid 1980's, a Carnegie Mellon professor named Geoff Hinton created a new version of an artificial neural network, based on Rosenblatt's Perceptron. Except his version include several hidden layers. This allowed his artificial neural network to work on much more complicated patterns. Still these early artificial neural networks struggled. They were slow, and they had to go through the problem several times before they could learn and improve. It wasn't until the 1990's that Hinton started working in a new field called Deep learning. They called it deep learning because there are many hidden layers. These additional layers created a large gap between the input and the output. This added capacity gave the artificial neural network a lot more room to match patterns. There were also new ways to help this network learn, newer techniques such as backpropagation could make sure that all the nodes spread their knowledge more quickly. Backpropagation goes back through the network and helps strengthen many of the neural connections. That helps these larger networks learn new things. These deep learning networks also clustered their neurons to help identify patterns. Clustering allowed the network to create categories and then sort the new information into these categories. Let's say you wanted to use a deep learning network to identify pictures of cats. You could give the network a picture of anything and it would tell whether there's a cat in that picture. To train the network, you'd feed a few million pictures with cats into the network. It would on it's own start to think about the pictures and categories. That way it could quickly break down each new picture as it enters the network. In some ways, clustering is pattern matching for patterns. It can find certain categories of patterns that help the network identify the image. It might notice that most animals have fur or most landscapes have a blue sky. Then each time it looked at a new image the network could discard anything it didn't recognize as an animal or a landscape. Then the network could completely focus on identifying the remaining photographs that might include a mountain or even a cat.

Contents