As discussed on our last blog article, Recurrent Neural Networks take as their input not just the current input example they see, but also what they perceived one or more steps back in time. Those previous time steps provide context for the current time step’s data. So RNNs analyse a representation of the current data, as well as a representation of its history, and combining those to make their decision about that data.
Recurrent Neural Networks were created in the 1980’s but have just been recently gaining popularity from advances to the networks designs and increase of cheaper computational power from graphic processing units. They’re especially useful with sequential data because each neuron or unit can use its internal memory to maintain information about the previous input. This is great because in cases of language, “I had washed my house” is much more different than “I had my house washed”. This allows the network to gain a deeper understanding of the statement.
Let us consider the following example – “working love learning we on deep”, did this make any sense to you? Not really – read this one – “We love working on deep learning”. Made perfect sense! A little jumble in the words made the sentence incoherent. Well, can we expect a neural network to make sense out of it? Not really! If the human brain was confused on what it meant I am sure a neural network is going to have a tough time deciphering such text.
There are multiple such tasks in everyday life which get completely disrupted when their sequence is disturbed. For instance, language as we saw earlier- the sequence of words defines their meaning, a time series data – where time defines the occurrence of events, the data of a genome sequence- where every sequence has a different meaning. There are multiple such cases wherein the sequence of information determines the event itself. If we are trying to use such data for any reasonable output, we need a network which has access to some prior knowledge about the data to completely understand it. Recurrent neural networks thus come into play.
How Are RNNs Used?
Let’s say the task is to predict the next word in a sentence. Let’s try accomplishing it using a Multi-Layered Perceptron (MLP). So, what happens in an MLP. In the simplest form, we have an input layer, a hidden layer and an output layer. The input layer receives the input, the hidden layer activation’s are applied and then we finally receive the output.
Let’s have a deeper network, where multiple hidden layers are present. So here, the input layer receives the input, the first hidden layer activations are applied and then these activations are sent to the next hidden layer, and successive activations through the layers to produce the output. Each hidden layer is characterized by its own weights and biases.
Since each hidden layer has its own weights and activations, they behave independently. Now the objective is to identify the relationship between successive inputs. Can we supply the inputs to hidden layers? Yes, we can!
Here, the weights and bias of these hidden layers are different. And hence each of these layers behave independently and cannot be combined together. To combine these hidden layers together, we shall have the same weights and bias for these hidden layers.
We can now combine these layers together, so that the weights and bias of all the hidden layers is the same. All these hidden layers can be rolled in together in a single recurrent layer.
So, it’s like supplying the input to the hidden layer. At all the time steps weights of the recurrent neuron would be the same since it’s a single neuron now. So, a recurrent neuron stores the state of the previous input and combines it with the current input thereby preserving and establishing some correlational relationship of the current input with respect to the previous input.
Use Cases and Applications of RNNs
The beauty of recurrent neural networks lies in their diversity of application. When we are dealing with RNNs they have a great ability to deal with various input and output types as follows:
We shall first highlight the broad applications of RNNs followed by specific use cases below:
- Modelling, in particular the learning of [often non-linear] dynamic systems
- Classification (now, FF Net are also used for that…)
- Combinatorial optimization
- Motion detection
- Load forecasting (as with utilities or services: predicting the load in the short term)
- signal processing: filtering and control
Now we shall show some specific real-world applications of RNNs below:
1. Autonomous Client Preference Classification: This can be a task of simply classifying tweets, comments, messages, texts and remarks into positive and negative sentiment without any supervision on the system. So here the input would be a tweet of varying lengths, while output is of a fixed type and size.
2. Autonomous Image Identification and Captioning: Let’s say we have an image for which we need a textual description. So, we have a single input – the image, and a series or sequence of words as output. Here the image might be of a fixed size, but the output is a description of varying lengths.
3. Autonomous Language Translation: This basically means that we have some text in a particular language let’s say English, and we wish to translate it in French. Each language has it’s own semantics and would have varying lengths for the same sentence. So here the inputs as well as outputs are of varying lengths.
4. RNNS in NLP: RNNS have great applications in collaboration with Natural Language Processing (NLP). RNNs have been demonstrated by many people on the internet who created amazing models that can represent a language model. These language models can take input such as a
large set of Shakespeare’s poems, and after training these models they can generate their own Shakespearean poems that are very hard to differentiate from originals like the following:
Alas, I think he shall be come approached and the day
When little srain would be attain’d into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.
They are away these miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.
Well, your wit is in the care of side and that.
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I’ll have the heart of the wars.
Come, sir, I will make did behold your worship.
I’ll drink it
This poem was actually written by an RNN. This was from an awesome article that goes more indepth on Character RNNs. This particular type of RNNs is fed in a dataset of text and reads the input in character by character. The amazing thing about these networks in comparison to feeding in a word at a time is that the network can create its own unique words that were not in the vocabulary you trained it on.
This diagram taken from the article referenced above shows how the model would predict “hello”. This gives a good visualization of how these networks take in a word character by character and predict the likely hood of the next probable character.
5. Machine Translation: Another amazing application of RNNs is machine translation. This method is interesting because it involves training two RNNs simultaneously. In these networks the inputs are pairs of sentences in different languages. For example, you can feed the network an English sentence paired with its French translation. With enough training you can give the network an English sentence and it will translate it to French! This model is called a Sequence 2 Sequences model or Encoder Decoder model.
This diagram shows how information flows through Encoders Decoder model. This diagram is using a word embedding layer to get better word representation. A word embedding layer is usually a GloVe or Word2Vec algorithm that just takes a bunch of words and creates a weighted matrix that allows similar words to be correlated with each other. Using an embedding layer generally makes your RNN more accurate because it is a better representation of how similar words are so the net has less to infer.
Thus, in conclusion, Recurrent Neural Networks have been becoming very popular over the recent times and for very good reasons indeed. They’re one of the most effective models out there for natural language processing. New applications of these models are coming out all the time and it’s exciting to see what researchers come up with in the near future! So stay tuned for our next blog article where we shall discuss some of the modern mathematical research breakthroughs of RNNs and their exciting applications!