Common Applications of Deep Learning
So far, we have gone from single-layer neural networks to multi-layer models with many hidden layers. We have also reviewed how these neural networks can serve as powerful tools for both classification and regression tasks. However, at this point, we still have only just scratched the surface; pushing beyond image classification, or simple regression use cases, deep learning approaches are now ascendant in fields ranging from cybersecurity and transportation to games and robotics.
Critically, deep learning hinges on one fundamental idea: given a dataset and a loss function, descend the gradient. As a result, we can generalize neural networks to solve a variety of different problems. However, specific domains present both unique challenges and useful assumptions.
In this article, we will discuss many common applications for deep learning, and highlight how neural networks have been adapted to these respective tasks.
Classification and Prediction in Challenging Domains
Neural networks excel at recognizing complex patterns in data, especially when that data is plentiful. It follows that deep learning is most commonly applied to datasets with many input features or where those features interact in complicated ways. As a result, neural networks have been wildly successful at tackling complex prediction and classification problems in domains including medicine and agriculture.
Consider the task of diagnosing a patient. While doctors may have an entire dossier of patient health records, only a select few of these data will be useful for making a correct prediction. Similarly, meaningful features can be extracted from combinations of raw data. For example, old age, a smoking habit, and a persistent cough together are much more predictive of lung cancer than those three features in isolation.
Because neural networks fantastically perform feature selection and extraction, they have successfully tackled many problems in medical subdomains, and show promise in others. Practitioners use deep learning to detect abnormalities in medical scans, predict health outcomes, or even combine clinical notes and medical codes to make a diagnosis.
Agronomics, the science of agriculture, is another field where deep nets are blossoming. Agriculture represents a precarious environment: harvests can be derailed by disease, drought, or plant infestations, which emerge from the complex interactions between crops, soil, and climate. In this setting, neural networks can provide powerful forecasting and classification tools to monitor and identify these threats. Additionally, neural networks can provide tools to assist farmers with scaling-up production.
Neural networks can ingest the rich data coming from sources like moisture sensors, drone footage, satellite imagery, and climate data, and extract relevant features for soil management, crop yield prediction, classification of plant species, livestock monitoring, and drought forecasting.
Many different tasks can be described as: “Given a sequence of data, how can we predict its next item(s)?” For example, consider the task of predicting the next word in a sentence. In computer science, this specific task is referred to as language modeling.
Language generation has undergone exceptional progress since the inception of deep learning. This has largely been because there is a lot of language data available (from the complete works of Shakespeare to our text messages) and because the rules of language are very complex (so alternative approaches to deep learning are hard to come by).
(Source: OpenAI: Better Language Models and their Implications)
Some of the neural network architectures that perform the best on language modeling tasks make use of language’s sequential nature. One of the most widely applied models for sequential data is called a Recurrent Neural Network (RNN).
Rather than just concatenating our input words together, RNNs process them sequentially. For every input word in order, we pass that word to our model. At each timestep, the input is then used to update the model’s hidden state. If we just want to predict the last word, we simply feed all but the last word into the model, then use the final hidden state to predict the next word. Alternatively, if we want to generate the entire sentence, we can feed a starting word in, update the hidden state, generate a new word, then pass that back into the new model, and so on.
Of course, this approach isn’t just useful for text generation. RNNs can be applied to sequential problems ranging from time series forecasting (like predicting stock prices), to even music generation.
Another popular use of neural networks is for translation between sequences. For example, to translate from Hindi to English, we can use one RNN to encode the Hindi sentence. Then can pass these encoded representations to another RNN, which decodes out the corresponding French sentence. This is called an Encoder-Decoder network.
Encoder-Decoder architectures are used for a variety of tasks, including language translation and summarization. We can even replace the RNN Encoder with a convolutional neural network, and utilize the resulting model for image captioning!
Any time we apply neural networks, we are sacrificing interpretability for effectiveness. This may be most true when working with sequential data, where entire sequences of information are combined into single vectors. This lack of interpretability raises a few problems: these models can secretly pick up on spurious correlations (false patterns that don’t capture the true meaning of language), or even worse, secretly encode bias from the text datasets used to train these models.
Autoencoders and Anomaly Detection
Let’s imagine we train an Encoder-Decoder architecture to encode an image into a hidden state, then decode out that very same input. If we do this, that hidden, intermediate vector of information will learn to encode the information from that input necessary for re-generating that same picture. In other words, that single intermediate vector will store the “meaning” of the input data.
Now, what happens if we make the intermediate hidden state smaller? In this case, we must compress the information in our input further, while still trying to preserve the features that matter. In order to do this, the encoder must also throw away features that don’t matter.
This is the big idea of autoencoders:
- The Encoder encodes input, and compresses it into a smaller latent representation, referred to as the Code.
- The Decoder tries to reconstruct the input.
- Our loss is the difference between our output and the original input. This is called the reconstruction loss.
So far in this course, we have focused on supervised learning approaches, where we train our model to map an input to a label. Autoencoders are our first example of unsupervised learning: approaches where we learn the patterns and structure of our data without labels.
Autoencoders have many uses. For one, they can be used as a preprocessing step, to compress documents and images. These smaller vectors can also be used in downstream tasks like classification, clustering, or information retrieval.
Without any additional labeled data, autoencoders can also be used for anomaly detection: the identification of rare, or suspicious data points (e.g. fake documents or credit card fraud).
As we noted earlier, autoencoders throw away information not needed to reconstruct regular training data. As a result, if we give the autoencoder an anomalous data point as input, it will be very different from the average training data, and will be harder to reconstruct. This means that your model will have a higher reconstruction error! This approach is used to accurately detect anomalies in datasets ranging from accounting data to brain scans.
For many games, the best player in the world isn’t human. DeepMind’s AlphaZero dominates the best human players in both Chess and Go. In Atari, Deep Q Learning has produced agents that are as good — or better — than any human gamer.
In the reinforcement learning framework, an agent takes actions in an environment and receives rewards. These are positive when the agent does good things (e.g. scores a point) and negative when the agent does bad things (e.g. loses health or dies). Using neural networks, combined with reinforcement learning loss functions, we can teach agents complex behaviors from scratch.
Simulation environments are bridging the divide between these games and real-world applications. For example, deep learning models are being trained via reinforcement learning to drive cars in a virtual setting. These models can then be fine-tuned in the real world!
Let’s say we want a model that generates pictures of cats. One possibility would be to use reinforcement learning. For example, we could have a human give our model a positive reward or negative reward: positive for good cat generations, negative for bad cat generations. However, getting those human labels would reduce training to a snail’s pace.
Alternatively, what if we train another model to determine whether our network is generating cat-like photos? Now, rather than just a cat-generator network, we introduce a real cat/generated cat classifier, called a discriminator network, and task this model with sorting out generated cats from real ones. This other model can then provide the training signal for the generator.
That’s the big idea behind Generative Adversarial Networks (GANs).
More formally, we train a generator network to generate images (by constantly fine-tuning its parameters) that will be classified as ‘real’ by a discriminator network. At the same time, we train our discriminator to take the latest generations from our generator, along with real images, and to classify them as real or fake.
Here’s how it all fits together, in the case of “cat generation”:
- The generator network takes in a random noise vector, and transforms it into a candidate cat image.
- The discriminator network is fed both generated images and real images.
- The discriminator is trained to differentiate generated cats from real cats.
- The generator tries to maximize how much it fools the discriminator.
And voila! GANs.
Excitingly, GANs can work very well, with no hand-engineered features. However, it’s important to note that training can be finicky, and selecting the right hyper-parameters is difficult. Sometimes your discriminator will get too good (and the generator won’t learn), and sometimes your generator will learn to only generate a single cat (mode collapse).
However, GAN-approaches have proved wildly successful, from image editing to face generation and style transfer. GANs have also been used to generate additional samples to augment existing datasets, including in medical domains.
Yet, GANs also introduce several ethical challenges. First, they have more sinister applications. Researchers have used GANs to generate seemingly genuine footage and audio of politicians, stoking fears that GANs offer dangerous tools for production of fake news content: so called Deep Fakes.
There are other, less explicit dangers to GANs. Because generators are training to replicate existing data, they can reproduce, or exacerbate biases in those data sets. For example, researchers found that Snapchat’s GAN-based filters, trained on imbalanced data, regularly whiten the skin of users.
In sum, GANs are representative of both the power and perils of neural networks. As deep learning practitioners, we should not only appreciate the utility of our models, but also be wary of the implications of our work.
In this article, we have discussed how neural networks are applied to hard classification and prediction problems, tasks involving sequential data, unsupervised anomaly detection, reinforcement learning, and GANs.
These applications represent just a sampling of the ever-developing, vibrant field of deep learning.