As we discussed in our previous blog post, an Artificial Neural Network (ANN), usually called “neural nets” (NNs), is a learning algorithm that is inspired by Biological Neural Networks within our human brains and central nervous-systems. Modern day computations especially those involving Machine Learning for the sake of developing Artificial Intelligence are mostly structured in terms of an interconnected group of artificial neurons, processing information using a connectionist approach to computation. Modern Neural Nets are non-linear statistical data modelling tools. They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture the statistical structure in an unknown joint probability distribution between the observed variables, especially in the domain of Big Data Analytics.
What Are Perceptrons?
In order to take a deep dive into the exciting and paradigm shifting world of Neural Nets, we must first take a look at its fundamental building blocks. Just as the entire human central nervous system can be broken down into its basic constituent building blocks called Neurons, similarly an Artificial Neural Network can be broken into its most basic fundamental units, which are capable of performing independent functions for supervised learning called Perceptrons.
Just like human neurons are designed to perform a specific set of functions, a perceptron ideally takes in multiple inputs with various preferential weights or biases attached to each input and sums the resultant combination to check if the signal reaches the desired threshold to perform the assigned task. Curiously enough, in human brains, a single neuron will pass a message to another neuron across this interface if the sum of weighted input signals from one or more neurons (summation) is great enough (exceeds a threshold) to cause the message transmission. This is called activation when the threshold is exceeded and the message is passed along to the next neuron.
The summation process can be mathematically complex. Each neuron’s input signal is actually a weighted combination of potentially many input signals, and the weighting of each input means that that input can have a different influence on any subsequent calculations, and ultimately on the final output of the entire network.
In addition, each neuron applies a function or transformation to the weighted inputs, which means that the combined weighted input signal is transformed mathematically prior to evaluating if the activation threshold has been exceeded. This combination of weighted input signals and the functions applied are typically either linear or nonlinear. These input signals can originate in many ways, with our senses being some of the most important, as well as ingestion of gases (breathing), liquids (drinking), and solids (eating) for example. A single neuron may receive hundreds of thousands of input signals at once that undergo the summation process to determine if the message gets passed along, and ultimately causes the brain to instruct actions, memory recollection, and so on.
The ‘thinking’ or processing that our brain carries out, and the subsequent instructions given to our muscles, organs, and body are the result of these neural networks in action. In addition, the brain’s neural networks continuously change and update themselves in many ways, including modifications to the amount of weighting applied between neurons. This happens as a direct result of learning and experience. Given this, it’s a natural assumption that for a computing machine to replicate the brain’s functionality and capabilities, including being ‘intelligent’, it must successfully implement a computer-based or artificial version of this network of neurons.
This resulted in the genesis of an advanced statistical technique thereby giving birth to Artificial Neural Networks based on the mathematical structure and function of Perceptrons as shown in the diagram.
Mathematical Architecture of Neural Nets
As established in the previous section, Neural Nets are statistical models directly inspired by, and partially modelled on biological neural networks. They are capable of modelling and processing nonlinear relationships between inputs and outputs in parallel. The related algorithms are part of the broader field of machine learning, and can be used in many applications as discussed.
Neural Networks are characterized by containing adaptive weights along paths between neurons that can be tuned by a learning algorithm that learns from observed data in order to improve the model. In addition to the learning algorithm itself, one must choose an appropriate cost function.
The cost function is what’s used to learn the optimal solution to the problem being solved. This involves determining the best values for all of the tenable model parameters, with neuron path adaptive weights being the primary target, along with algorithm tuning parameters such as the learning rate. It’s usually done through optimization techniques such as gradient descent or stochastic gradient descent. These optimization techniques basically try to make the ANN solution be as close as possible to the optimal solution, which when successful means that the ANN is able to solve the intended problem with high performance.
Architecturally, a Neural Net is modelled using layers of artificial neurons, or computational units able to receive input and apply an activation function along with a threshold to determine if messages are passed along.In a simple model, the first layer is the input layer, followed by one hidden layer, and lastly by an output layer. Each layer can contain one or more neurons. However, these models can become increasingly complex, and with increased abstraction and problem-solving capabilities by increasing the number of hidden layers, the number of neurons in any given layer, and/or the number of paths between neurons. Note that an increased chance of overfitting can also occur with increased model complexity. Just as the various layers are shown in the diagram below.
Model architecture and tuning are therefore major components of ANN techniques, in addition to the actual learning algorithms themselves. All of these characteristics of an ANN can have significant impact on the performance of the model. Additionally, models are characterized and tuneable by the activation function used to convert a neuron’s weighted input to its output activation. There are many different types of transformations that can be used as the activation function, and a discussion of them is out of scope for this article.
The abstraction of the output as a result of the transformations of input data through neurons and layers is a form of distributed representation, as contrasted with local representation. The meaning represented by a single artificial neuron for example is a form of local representation. The meaning of the entire network however, is a form of distributed representation due to the many transformations across neurons and layers.
One thing worth noting is that while ANNs are extremely powerful, they can also be very complex and are considered black box algorithms, which means that their inner-workings are very difficult to understand and explain. Choosing whether to employ ANNs to solve problems should therefore be chosen with that in mind.
Enter Deep Learning
Deep learning, while sounding fancy, is actually just a term to describe certain types of neural networks and related algorithms that consume often very raw input data. They process this data through many layers of nonlinear transformations of the input data in order to calculate a target output.
Unsupervised feature extraction is also an area where deep learning excels. Feature extraction is when an algorithm is able to automatically derive or construct meaningful features of the data to be used for further learning, generalization, and understanding. The burden is traditionally on the data scientist or programmer to carry out the feature extraction process in most other machine learning approaches, along with feature selection and engineering. Feature extraction usually involves some amount dimensionality reduction as well, which is reducing the amount of input features and data required to generate meaningful results. This has many benefits, which include simplification, computational and memory power reduction, and so on.
More generally, deep learning falls under the group of techniques known as feature learning or representation learning. As discussed so far, feature extraction is used to ‘learn’ which features to focus on and use in machine learning solutions. The machine learning algorithms themselves ‘learn’ the optimal parameters to create the best performing model. Feature learning algorithms allow a machine to both learn for a specific task using a well-suited set of features, and also learn the features themselves. In other words, these algorithms learn how to learn!
Deep learning has been used successfully in many applications, and is considered to be one of the most cutting-edge machine learning and AI techniques at the time of this writing. The associated algorithms are often used for supervised, unsupervised, and semi-supervised learning problems. For neural network-based deep learning models, the number of layers is greater than in so-called shallow learning algorithms. Shallow algorithms tend to be less complex and require more up-front knowledge of optimal features to use, which typically involves feature selection and engineering.
In contrast, deep learning algorithms rely more on optimal model selection and optimization through model tuning. They are more well suited to solve problems where prior knowledge of features is less desired or necessary, and where labelled data is unavailable or not required for the primary use case. In addition to statistical techniques, neural networks and deep learning leverage concepts and techniques from signal processing as well, including nonlinear processing and/or transformations.
You may recall that a nonlinear function is one that is not characterized simply by a straight line. It therefore requires more than just a slope to model the relationship between the input, or independent variable, and the output, or dependent variable. Nonlinear functions can include polynomial, logarithmic, and exponential terms, as well as any other transformation that isn’t linear. Many phenomena observed in the physical universe are actually best modelled with nonlinear transformations. This is true as well for transformations between inputs and the target output in machine learning and AI solutions.
A Deeper Dive
As mentioned before, Deep Learning involves the transformation of input data through multiple layers of Artificial Neural Network processing units such as Perceptrons designed to train and change weights in order to fulfil a task with maximum success and also to increase the probability of success progressively in future trails. The specific changes that occur to maximize objective success from input to output is known as the credit assignment path, or CAP.
The CAP value is a proxy for the measurement or concept of ‘depth’ in a deep learning model architecture. According to Wikipedia, most researchers in the field agree that deep learning has multiple nonlinear layers with a CAP greater than two, and some consider a CAP greater than ten to be very deep learning. While a detailed discussion of the many different deep-learning model architectures and learning algorithms is beyond the scope of this article, some of the more notable ones include:
- Feed-forward neural networks
- Recurrent neural network
- Multi-layer Perceptrons (MLP)
- Convolutional neural networks
- Recursive neural networks
- Deep belief networks
- Convolutional deep belief networks
- Self-Organizing Maps
- Deep Boltzmann machines
- Stacked de-noising auto-encoders
It’s worth pointing out that due to the relative increase in complexity, deep learning and neural network algorithms can be prone to overfitting. In addition, increased model and algorithmic complexity can result in very significant computational resource and time requirements. It’s also important to consider that solutions may represent local minima as opposed to a global optimal solution. This is due to the complex nature of these models when combined with optimization techniques such as gradient descent.
Given all of this, proper care must be taken when leveraging artificial intelligence algorithms to solve problems, including the selection, implementation, and performance assessment of algorithms themselves. Thus, Neural Nets along with its more complex Deep Learning technique are some of the most capable AI tools for solving very complex problems in modern society, and will continue to be developed and leveraged in the immediate future.
While a terminator-like scenario is unlikely any time soon, the progression of artificial intelligence techniques and applications will certainly revolutionize the world of tomorrow! To be a part of that revolution stay tuned for our next blog post!