Picking up form where we left off on our previous blog article, it is evident that, the fundamental feature of a Recurrent Neural Network (RNN) is that the network contains at least one feed-back connection, so the activations can flow round in a loop. That enable the networks to do temporal processing and learn sequences, e.g., perform sequence recognition/reproduction or temporal association/prediction.
Furthermore, Recurrent Neural Networks are one of the most common Neural Networks used in Natural Language Processing because of its promising results. The applications of RNN in language models consist of two main approaches. We can either make the model predict or guess the sentences for us and correct the error during prediction or we can train the model on particular genre and it can produce text similar to it, which has fascinating real world applications that can transform our society as we know it!
Basic RNN Framework Architecture
As discussed in details before, basic RNNs contain network of neuron-like nodes which are organized into successive “layers”, where, each node in a given layer is connected with a directed (one-way) connection to every other node in the next successive layer. In this case, each node or neuron has a time-varying real-valued activation function. Moreover, each connection (synapse) has a modifiable real-valued weight. Nodes are either input nodes (receiving data from outside the network), output nodes (yielding results), or hidden nodes in case of multiple hidden layers to optimize target success.
For supervised learning in discrete time settings, sequences of real-valued input vectors arrive at the input nodes, one vector at a time. At any given time-step, each non-input unit computes its current activation (result) as a nonlinear function of the weighted sum of the activations of all units that connect to it. Supervisor-given target activations can be supplied for some output units at certain time steps. For example, if the input sequence is a speech signal corresponding to a spoken digit, the final target output at the end of the sequence may be a label classifying that particular digit.
In reinforcement learning settings, no human instructor specifically provides target signals. Instead, a fitness function or reward function is occasionally used to evaluate the RNN’s performance, based on the concept of gradient descent. This in turn influences its input streams through output units connected to actuators that affect the environment. For example, this might be used to play a game in which progress is measured with the number of points won.
Each sequence produces an error as the sum of the deviations of all target signals from the corresponding activations computed by the network. For a targeted training set of numerous
sequences, the total error is the sum of the errors of all individual sequences and this net error needs to be minimized every successive time by the RNN to achieve optimal success.
Advanced RNN Architectures
Now, with the basic framework in mind as discussed above, let us delve into some advanced and complex RNN Architectures which have been designed to solve specific targeted problems especially
with respect to Big Data Analytics and Natural Language Processing. So here are some of the popular advanced architectures of RNNs as elaborated below:
1. Elman & Jordan Networks
An Elman network is a three-layer network which is usually arranged horizontally with the addition of a set of “context units” as shown in the illustration below. In the diagram, the middle layer is called the hidden layer, which is connected to these context units fixed with a weight of one. At each time step, the input is fed-forward and a specific learning rule is applied based on the principles of gradient descent. The fixed back-connections save a copy of the previous values of the hidden units in the context units, since they propagate over the connections before the learning rule is applied thereby allowing Back Propagation Through Time to minimize errors and enhance success. Thus, the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard multilayer perceptron.
Jordan networks are similar to Elman networks, however, in this case, the context units are fed from the output layer instead of the hidden layer. The context units in a Jordan network are also referred to as the state layer. They have a recurrent connection to themselves. Elman and Jordan networks are also collectively known as “simple recurrent networks” or SRNs.
2. Hopfield Networks
A Hopfield network is a form of a Recurrent Neural Network that was popularized by John Hopfield in 1982. Hopfield Neural Nets serve as content-addressable, “associative” memory systems with
binary threshold nodes i.e. 1 & 0 or 1 & -1. They are guaranteed to converge the error function to a local minimum, but will sometimes converge to a false pattern or the wrong local minimum rather than the stored pattern which is the expected local minimum. networks also provide a model for understanding human memory.
In case of a Hopfield Network, the weights are symmetric which guarantees that the error function decreases monotonically while following the activation rules. A network with asymmetric weights may exhibit some periodic or chaotic behaviour; however, Hopfield found that this behaviour is confined to relatively small parts of the phase space and does not impair the network’s ability to act as a content-addressable associative memory system.
Initialization of the Hopfield Networks is done by setting the values of the units to the desired start pattern. Repeated updates are then performed until the network converges to an attractor pattern. Convergence is generally assured, as Hopfield proved that the attractors of this nonlinear dynamical system are stable, not periodic or chaotic as in some other systems. Therefore, in the context of Hopfield Networks, an attractor pattern is a final stable state, a pattern that cannot change any value within it under updating.
There are various different learning rules that can be used to store information in the memory of the Hopfield Network. It is desirable for a learning rule to have both of the following two properties:
- Local: A learning rule is local if each weight is updated using information available to neurons on either side of the connection that is associated with that particular weight.
- Incremental: New patterns can be learned without using information from the old patterns that have been also used for training. That is, when a new pattern is used for training, the new values for the weights only depend on the old values and on the new pattern.
These properties are desirable, since a learning rule satisfying them is more biologically plausible. For example, since the human brain is always learning new concepts, one can reason that human learning is incremental. A learning system that were not incremental would generally be trained only once, with a huge batch of training data. Thus, Recurrent Networks were primarily designed to mimic the human brain and are effective in mimicking forms of human learning in a way such that it leads to repetitive optimization of success at a given task.
3. Bidirectional Associative Memory Networks
Bidirectional associative memory (BAM) is a type of Recurrent Neural Network which was introduced by Bart Kosko in 1988. This primarily consists of two types of associative memory, auto-associative and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory. However, Hopfield nets return patterns of the same size but BAMs return patterns of different sizes.
A BAM mainly contains two layers of neurons, which can be denoted as X and Y. Layers X and Y are fully connected to each other. Once the weights have been established, input into layer X presents the pattern in layer Y, and vice versa. BAMs were primarily used in the field of pattern recognition and associated problem solving.
4. Recursive Neural Networks
A Recursive Neural Network is a type of RNN that takes a leap into the category of Deep Neural Networks created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures by traversing a given structure in topological order. These Neural Networks have been successful, for instance, in learning sequence and tree structures in Natural Language Processing, mainly phrase and sentence continuous representations based on word embedding. RNNs have first been introduced to learn distributed representations of structure, such as logical terms. Models and successive general framework iterations have been developed in further details since the 1990s.
The Primary distinctive feature of Recursive Neural Nets is that they use the techniques of Stochastic Gradient Descent and Back Propagation Through Structure of the Neural Net instead of Back Propagation Through Time as used in Recurrent Neural Networks.
Recurrent Neural Networks are related to Recursive Neural Networks but Recurrent Nets always have a certain structure: that of a linear chain. Whereas Recursive Neural Networks operate on any hierarchical structure, combining child representations into parent representations thereby allowing for Back Propagation Through Structure to work on them. However, Recurrent Neural Networks operate on the linear progression of time and not structure, combining the previous time step and a hidden representation into the representation for the current time step.
5. Long Short-Term Memory Networks
Long Short-Term Memory (LSTM) is a type of RNN, Deep Learning system that avoids the vanishing gradient problem. LSTM is normally augmented by recurrent gates called “forget” gates. LSTM prevents backpropagated errors from vanishing or exploding gradients. So, an LSTM can learn tasks that require memories of events that happened thousands or even millions of discrete time steps earlier. Problem-specific LSTM-like topologies can be evolved for solving data analysis problems that would no longer require step-wise human intervention. This is because, LSTMs work even given long delays between significant events and can handle signals that mix low and high frequency components.
Many applications use stacks of LSTM RNNs and train them by Connectionist Temporal Classification (CTC) to find an RNN weight matrix that maximizes the probability of the label sequences in a training set, given the corresponding input sequences. CTC achieves both alignment and recognition. An LSTM can thereby learn to recognize context-sensitive languages unlike previous models discussed above.
6. Recurrent Multilayer Perceptron & Multiple Timescales Network
A Recurrent Multi-Layer Perceptron (RMLP) network consists of cascaded subnetworks, each of which contains multiple layers of nodes. Each of these subnetworks is feed-forward mechanism based, except for the last layer, which can have feedback connections.
A Multiple Timescales Recurrent Neural Network (MTRNN) on the other hand is a neural-based computational model that can simulate the functional hierarchy of the brain through self-organization that depends on spatial connection between neurons and on distinct types of neuron activities, each with distinct time properties. With such varied neuronal activities, continuous sequences of any set of behaviours are segmented into reusable primitives, which in turn are flexibly integrated into diverse sequential behaviours. The biological approval of such a type of hierarchy was discussed in the memory-prediction theory of brain function by Jeff Hawkins in his book On Intelligence. This is by far the most advanced and complex type of RNN that is widely used today and has the capability of contextual and non-contextual pattern recognition and Natural Language Processing Capabilities.
Now let us present a list of advanced applications that these 6 types of Recurrent Neural Nets are being trained to carryout in the world today:
- Machine Translation
- Controlling Robotic Components and Arms in Industrial Automation
- Time series prediction for Finance Related Big Data Analytics
- Speech recognition
- Time series anomaly detection which is very useful in the case of Fraud Detection
- Rhythm learning for repetitive performance of industrial tasks
- Music composition
- Contextual and Non-contextual Grammar learning
- Handwriting recognition
- Human action recognition for Smart City Governance
- Protein Homology Detection for the Pharmaceutical industry
- Predicting subcellular localization of proteins in the Healthcare Sector
- Several prediction tasks in the area of business process management to optimize time and financial expenditures.
- Prediction in medical care pathways for a patient’s medical history.
Click on the links below to check out the fascinating achievements of RNNs independent of any human intervention:
- RNN Generated TED Talks:
- RNN Generated Music:
Thus, we have presented you with a holistic in-depth understanding of the capabilities and attributed of RNNs. So, stay tuned to our blog articles for the next best breakthroughs in Machine Learning and Artificial Intelligence.