What’s the difference between CNN and RNN?
Artificial neural networks are the brains behind some of the most sophisticated applications of artificial intelligence (AI). But that doesn’t mean understanding the different types needs to be complicated. When it comes to artificial neural networks — computing systems that mimic components of the brain — there are serious differences in type. Understanding these distinctive forms, and their nuances and varied applications, could make the difference between success and failure in your next AI and machine learning initiative.
In machine learning, each type of artificial neural network is tailored to perform certain sets of tasks. In order to explain these tasks and the best approaches to completing them, this article will introduce two types of artificial neural networks: convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Read on to learn about both types, associated key terms and the real-life applications deployed today, particularly in computer vision.
What’s the difference between CNN and RNN?
The main difference between a CNN and an RNN is the ability to process temporal information — data that comes in sequences, such as a sentence. Recurrent neural networks are designed for this very purpose, while convolutional neural networks are incapable of effectively interpreting temporal information. As a result, CNNs and RNNs are used for completely distinct purposes, and there are differences in the structures of the neural networks themselves to fit those different use cases.
CNNs employ filters within convolutional layers to transform data (more on that later), whereas RNNs are predictive, reusing activation functions from other data points in the sequence to generate the next output in a series.
Once you look at the structure of both types of neural networks and understand what they are used for, the difference between CNN and RNN becomes more clear.
What is a recurrent neural network?
It bears repeating: Recurrent neural networks are designed to interpret temporal or sequential information. These networks use other data points in a sequence to make better predictions. They do this by taking in input and reusing the activations of previous nodes or later nodes in the sequence to influence the output.
Entity extraction in text is a great example of how data in different parts of a sequence can affect each other. With entities, the words that come before and after the entity in the sentence have a direct effect on how they are classified. In order to deal with temporal or sequential data, like sentences, we have to use algorithms that are designed to learn from past data and ‘future data’ in the sequence.
Take, for example, the following text: President Roosevelt was one of the most influential presidents in American history. However, Roosevelt Street in Manhattan was not named after him.
In the first sentence, Roosevelt should be labeled as a person entity, but in the second sentence it should be labeled as a street name or a location. Knowing these distinctions is not possible without taking into account the words before them (“President”), and after them (“Street”). Image of RNN structure.
RNNs for autocorrect
To dive a little deeper into how an RNN works, you need look no further than autocorrect systems. At a basic level, autocorrect systems take the word you’ve typed as input. Using that input, the system makes a prediction as to whether the spelling is correct or incorrect. If the word doesn’t match any words in the database, or doesn’t fit in the context of the sentence, the system then predicts what the correct word might be. Let’s visualize how this process works making use of an RNN.
Image of an autocorrect system using RNN.
The RNN would take in two sources of input. The first input is the letter you’ve typed. The second input would be the activation functions corresponding to the previous letters you typed. Let’s say you wanted to type “network,” but typed “networc” by mistake. The system takes in the activation functions of the previous letters “networ,” and the current letter you’ve inputted, “c.” It then offers “k” as the correct output for the last letter.
This is just one simplified example of how RNNs can work for spelling correction. Today, data scientists use RNNs to do an array of incredible things — from generating text and captions for images to creating music and predicting stock market fluctuations, RNNs have endless potential use cases.
The essential guide to AI training data
Discover best practices for the sourcing, labeling and analyzing of training data from TELUS International, a leading provider of AI data solutions.
What is a convolutional neural network?
Convolutional neural networks are one of the most common types of neural networks used in computer vision to recognize objects and patterns in images. One of their defining traits is the use of filters within convolutional layers.
CNNs have unique layers called convolutional layers that separate them from RNNs and other neural networks.
Within a convolutional layer, the input is transformed before being passed to the next layer. A CNN transforms the data by using filters.
What are filters in convolutional neural networks?
A filter is simply a matrix of randomized number values like you see in the diagram below.
Image of an example 3 x 3 filter.
The number of rows and columns in the filter can vary and is dependent on the use case and data being processed. Within a convolutional layer, there are a number of filters that move through an image. This process is referred to as convolving, which is a term with origins in mathematics involving the act of combining. The filter convolves the pixels of the image, changing their values before passing the data on to the next layer in the network.
How do filters work?
To understand how filters transform data, let’s take a look at how we can train a CNN to recognize handwritten digits. Below is an enlarged version of a 28 x 28 pixel image of the number seven from the MNIST dataset, which includes images of handwritten digits.
Image of a handwritten “7” from the MNIST dataset.
Below is the same image converted into its pixel values — a measurement used to describe relative contrast between pixels.
Image of a handwritten “7” from the MNIST dataset converted into its pixel values.
As the filter convolves through the image, the matrix of values in the filter line up with the pixel values of the image and the dot product of those values is taken.
Image of dot product calculation for handwritten “7” from the MNIST dataset.
The filter moves, or convolves, through each 3 x 3 matrix of pixels until all the pixels have been covered. The dot product of each calculation is then used as input for the next layer.
Initially, the values in the filter are randomized. As a result, the first passes or convolutions act as a training phase and the initial output isn’t very useful. After each iteration, the CNN adjusts these values automatically using a loss function. As the training progresses, the CNN continuously adjusts the filters. By doing so, it is able to distinguish edges, curves, textures and more patterns and features of the image.
While this is an impressive feat, in order to implement loss functions, a CNN needs to be given examples of correct output in the form of labeled training data. Typically CNNs benefit from transfer learning, which is a practice that involves amassing knowledge about a problem and applying it to similar cases in the future. When transfer learning can’t be applied, many convolutional neural networks require exorbitant amounts of labeled data.
Apply what you’ve learned
The world of artificial intelligence is vast and still growing. Now that you know the basics about convolution neural networks and recurrent neural networks, there’s new depths to reach and new topics to explore. If you’re looking to apply what you’ve learned to your next project, get in touch with our team of experts for support.