Deep Learning – what on earth does that mean?
In my last post, I demystified a variety of buzzwords, and explained that Deep Learning is a subset of Machine Learning. This post explores the world of deep learning for non-mathematicians (just like myself). In doing so it:
- Touches on convolutional neural networks;
- Explains the impact Deep Learning is having on Cognitive Computing;
- Outlines a few examples of Cognitive Computing (Deep Learning) in action.
Starting with Artificial Neural Networks
To understand Deep Learning, you must first understand a little about Artificial Neural Networks. Don’t worry – as I am not a data scientists I will not try to describe the mathematics behind it all. That means no talk beyond this sentence of weighting, activation functions and more.
Deep Learning normally revolves around the use of Artificial Neural Networks with more than 1 hidden layers. More on what a hidden layer is shortly. The theory is that the more hidden layers you have the more you can isolate specific regions of data to classify things.
Luis Serrano has a great image which I will use to explain the concept of isolating regions to help you classify data. I have included it below and the explanation will refer back to it.
In this image Luis shows inputs consisting of 2 values (x and y). The historical training data we have tells us if the result of a set of x and y values is blue or red (arbitrary outcome). To show that visually you can plot those values as shown above if you visualize an X and Y axis.
The aim of this network is to help detect blue outcomes. To do that the computer comes up with two different regression models (in this case) to come up with probabilities that a given point is blue or red. Normally this is never perfect which is why in the image you can see both lines result in some errors. The computer will optimize to reduce those errors as much as possible though.
By using two regressions the computer can then combine the probability of something being blue from two models (with some other magic) to come up with the region shown on the very right. That third region dissects the data more precisely and will help deliver a more accurate probability of something being blue or not.
In other words Artificial Neural Networks work to separate and classify data so that when new data is presented you classify it.
In the example above what this means is that given a new x and y value, which has never been seen before, you can push it through the network. The network will deliver you probability of it being a blue outcome with a high degree of confidence. By definition, if the choice is binary you also know that if the probability of blue is very low it is red in this example. This is a pretty simple example but illustrates the concept.
Caveat to that point on many layers. Those deep in this space can have long discussions about how many layers makes sense. It is possible to have hundreds of hidden layers. Many argue that doing so does not always make a massive difference to the accuracy and in fact may even reduce accuracy. The truth is the answer lies in what you are doing. As with all modelling trying a few different approaches and then selecting the best after comparison is the way forwards.
What is clear is that:
- The more hidden layers and nodes you have the more computation power you need for training of the model and each subsequent execution.
- The more data you want to provide in training the model the more computation power will be required. On the flip side the more data you have the more accurate the model will be in general. Models will also improve over time as they get more and more data.
These two things have lead to the rise of GPU based processing in the world of Deep Learning. GPU based processing allows for parallel execution, on large numbers of relatively cheap processors, especially when training an artificial neural network with many hidden layers and a lot of input data.
Pictorial Representation of an Artificial Neural Network
Ok. So lets step back and look at a pictorial representation of a simple Artificial Neural Network. It is broadly comprised of three things. An input layer, one or more hidden layers and an output layer. You could see those same three layers in the sample from Luis I used earlier.
- On the left-hand side, you see the Input Layer. In this case 4 things are input to the network. Each input node connects to every node in the first hidden layer. In the example from Luis used earlier we had 2 inputs in the input layer being the x and y values.
- Then there is generally one, or two, hidden layers with a number of nodes that are all connected to every input and every output node. Normally we tell the algorithm how many nodes we want in the hidden layer but the computer handles the modelling based on the data. This is the self learning part of things and where a network can change over time as more data is fed in (so that a better ultimate classification happens). Each node is modelling something based on the input (so probability of being blue using regression models in the example I used from Luis). Essentially the network at this point is “splitting” and “classifying” data.
- On the right-hand side, you see the output layer with several output nodes. Each of those nodes will deliver an output which helps classify the input. If the Network output was different breeds of dogs, and we were looking for 3 specific breeds, then the model above would share the probability of the input being that specific breed of dog. Those probabilities are fed then to an application that can either present them all or simply use the highest probability to drive a decision. In the example from Luis it was simply the probability of the input being blue.
In this artificial neural network all hidden nodes are connected to their predecessor and successor with edges. This sort of Neural Network is known as a Fully Connected Network. There are ways to do something called pruning so that you do not have everything connected. For this blog I am going to skip over that. Any pruning would normally be done only after you have trained the model to improve accuracy and performance.
Now that you understand what an Artificial Neural Network looks like, and what it is trying to do, it is easy to imagine that a Deep Neural Network is very similar. The main difference is it normally contains 2 or more hidden layers (some people think you need more than 2 hidden layers for your model to be said to be a Deep Learning model). Deep Learning opens up new possibilities to uniquely classify things with very high levels of accuracy. This is perhaps most evident in the rapidly developing world of speech, text and visual classification.
Some real world Neural Networks
There is a great blog post that looks at Alpha Go in detail. That is the Narrow AI that was deigned by Google to win at the game of Go. One part of that shows the Deep Learning Neural Network behind how it mastered the game after it played millions of games. The network contains very few layers as you can see. It ultimately comes up with just 6 outputs which were used to make the decision on what to do based on the game situation.
Autonomous cars also get a lot of attention nowadays but it is a very similar concept. Check out this blog from David Simpleton which shows how he build a self driving model car. Below is the image of his Neural Network. It was more complicated with many hidden layers. Ultimately though it provided outputs as to the probability as to what the car should do next.
Eagle eyed readers might have spotted a word convolution in the AlphaGo network image. That is because these two examples are using something called Convolutional Neural Networks. this is a type of neural network created back in the 80s proven to work very well in handling speech, text and images. Both examples I have shown above are using visual data!
What is a Convolutional Neural Network?
In short Convolutional Neural Networks break down images into smaller parts (convolution) and then to try to identify specific features using filters that look for specific patterns in the image. This then builds something called a convolution layer which can be shrunk using something called pooling. Based on what is found that is then used in a fully connected network to determine what is in the image. If you really want to understand take a look at this video from Brandon Rohrer which I think really explains it well. It would take me a long time and a lot of words to explain it in this post.
If you have made it this far you are through the hard part and you are on your way to understand a lot of the smarts behind Narrow AI today.
Ultimately you just need to remember that we are using these networks to classify things via a variety of methods.
Impact on Cognitive Computing
I plan to write a whole blog on the importance of cognitive computing but for now you should be clear that Deep Learning is behind many of the advancements we are seeing with Cognitive Computing today.
Cognitive Computing deals with enabling computers to interact with us in a humanlike manner. That means having them able to understand images, understand speech, understand text etc. and reciprocate accordingly.
Convolutional Neural Networks have been the key to that progress along with the vast quantities of data and the massive compute we now have available. To succeed in Cognitive Computing you need THREE things (assuming you have people and tools to build the model).
- You need a great deal of input data to train a good network. For example to build a great network that can recognize objects you need thousands and thousands of images that contain that object plus the same that do not. There is a great Ted talk by Fei-Fei Li which explains why.
- You need a lot of compute power to make that training happen in a sensible time period and to continue to power the evolution of the model over time.
- You need compute power to use the trained model in your application.
It is clear to me that cognitive computing will make sense, for most, to be delivered as a service which you can embed into your applications. Developing a lot of this in-house will be difficult given the training data and compute requirements to make it accurate. This may change dependent on how much access to training data is opened up over time.
Today we see companies using things like the Microsoft Cognitive Services in their applications to add “narrow” artificial intelligence. There are basically two things.
- Pre-trained models, being continuously updated, that people can exploit as a service without any knowledge of all we have covered in this blog. Examples might be visual recognition of common objects/celebrities or speech and text recognition.
- Black box services you can use to train models to detect specific things which may be proprietary to you. Examples might be facial recognition of specific objects/people or identifying someone from their own speech.
These two capabilities offer maximum flexibility. The first set are hard to replicate yourself and the second set provides you with a quick means to develop the Narrow AI services you need without needing an army of skilled data scientists.
Below are two examples of Cognitive Services in action which have introduced narrow AI to specific applications/business processes.
- Uber has introduced Real-Time ID Check, an additional security feature that periodically prompts drivers to share a selfie with Uber before they go online to start accepting ride requests.Real-Time ID Check uses Microsoft Cognitive Services intelligence to instantly compare the selfie to the photo corresponding with the driver’s photo on file. If the two photos don’t match, the driver’s account can be temporarily deactivated while Uber looks into the situation.
This feature prevents fraud and protects drivers’ accounts from being compromised. It also protects riders by building in another layer of accountability to the Uber app to let passengers know that the right person is behind the wheel.
- McDonalds is using Microsoft Cognitive Services to help them with understanding orders at Drive through restaurants. To do that they use a service which translates speech into text.
Beyond these two examples we are starting to see the infusion of narrow AI services into all sorts of applications. Often people have no idea they are getting the benefits of AI. For many of us that is the way it will always be. AI helping to empower each and every one of us and each and every organization to achieve and understand more.
This blog has demystified the buzz around Deep Learning. It has shown that Deep Learning is nothing more than an Artificial Neural Network with two or more hidden layers. It has explained that Convolutional Neural Networks are fuelling the Cognitive Computing boom we see today. Finally the post shared a few cases where narrow AI to be infused into applications and business processes.
I am sure there are many experts out there and I would love to hear if you would change anything in this post. For the rest please share if this post helped or how I can try simplify further to clear up any confusion.