A recurrent neural network (RNN) is a type of artificial neural network commonly used in speech recognition and natural language processing (NLP). RNNs are designed to recognize a data's sequential characteristics and use patterns to predict the next likely scenario.
RNNs are used in deep learning and in the development of models that simulate the activity of neurons in the human brain. They are especially powerful in use cases in which context is critical to predicting an outcome and are distinct from other types of artificial neural networks because they use feedback loops to process a sequence of data that informs the final output, which can also be a sequence of data . These feedback loops allow information to persist; the effect is often described as memory.
RNN use cases tend to be connected to language models in which knowing the next letter in a word or the next word in a sentence is predicated on the data that comes before it. A compelling experiment involves an RNN trained with the works of Shakespeare to produce Shakespeare-like prose -- successfully. Writing by RNNs is a form of computational creativity. This simulation of human creativity is made possible by the AI’s understanding of grammar and semantics learned from its training set.
How recurrent neural networks learn
Artificial neural networks are created with interconnected data processing components that are loosely designed to function like the human brain. They are composed of layers of artificial neurons (network nodes) that have the capability to process input and forward output to other nodes in the network. The nodes are connected by edges or weights that influence a signal's strength and the network's ultimate output.
In some cases, artificial neural networks process information in a single direction from input to output. These "feedforward" neural networks include convolutional neural networks that underpin image recognition systems . RNNs, on the other hand, can be layered to process information in two directions.
Like feedforward neural networks, RNNs can process data from initial input to final output. Unlike feedforward neural networks, RNNs use feedback loops such as Backpropagation Through Time or BPTT throughout the computational process to loop information back into the network. This connects inputs together and is what enables RNNs to process sequential and temporal data.
Long Short-Term Memory units
One drawback to standard RNNs is the vanishing gradient problem, in which performance of the neural network suffers because it can't be trained properly. This happens with deeply layered neural networks, which are used to process complex data.
Standard RNNs that use a gradient-based learning method degrade the bigger and more complex they get. Tuning the parameters effectively at the earliest layers becomes too time consuming and computationally expensive.
One solution to the problem is called Long Short-Term Memory (LSTM) units, which were invented by computer scientists Sepp Hochreiter and Jurgen Schmidhuber in 1997. RNNs built with LSTM units categorize data into short term and long term memory cells . Doing so enables RNNs to figure out what data is important and should be remembered and looped back into the network, and what data can be forgotten.