Supervised learning, in the context of artificial intelligence (AI) and machine learning, is a type of system in which both input and desired output data are provided. Input and output data are labelled for classification to provide a learning basis for future data processing. The term supervised learning comes from the idea that an algorithm is learning from a training dataset, which can be thought of as the teacher.
Supervised machine learning systems provide the learning algorithms with known quantities to support future judgments. Chatbots, self-driving cars, facial recognition programs, expert systems and robots are among the systems that may use either supervised or unsupervised learning. Supervised learning systems are mostly associated with retrieval-based AI but they may also be capable of using a generative learning model.
How does supervised learning work?
In general, supervised learning occurs when a system is given input and output variables with the intentions of learning how they are mapped together, or related. The goal is to produce an accurate enough mapping function that when new input is given, the algorithm can predict the output. This is an iterative process, and each time the algorithm makes a prediction, it is corrected or given feedback until it achieves an acceptable level of performance.
Training data for supervised learning includes a set of examples with paired input subjects and desired output (which is also referred to as the supervisory signal). For example, in an application of supervised learning for image processing, an AI system might be provided with labeled pictures of vehicles in categories such as cars or trucks. After a sufficient amount of observation, the system should be able to distinguish between and categorize unlabeled images, at which time the training is complete.
Applications of supervised learning are typically broken down into two categories, classification and regression. Classification is similar to the example above, when the output value is a category such as car or truck and true or false. A regression problem is when the output is a real, computed value such as the price or weight.
Supervised learning algorithms
Common supervised machine learning algorithms are:
- Linear regression.
- Logistic regression.
- Artificial neural networks (ANN).
- Linear discriminant analysis.
- Decision trees.
- Similarity learning.
- Bayesian logic.
- Support vector machines (SVM).
- Random forests.
When choosing a supervised learning algorithm, there are a few things that should be considered. The first is the bias and variance that exists within the algorithm as there is a fine line between being flexible enough and too flexible. Another is the complexity of the model or function that the system is trying to learn. Additionally, the heterogeneity, accuracy, redundancy and linearity of the data should be analyzed before choosing an algorithm.
Supervised vs. unsupervised learning
Unsupervised learning is when an algorithm is only given input data, without corresponding output values, as a training set. Unlike with supervised learning, there is no correct output values or teachers. Instead, algorithms are able to function freely in order to learn more about the data and present interesting findings. Unsupervised learning is popular in applications of clustering, or the act of uncovering groups within data, and association, or the act of predicting rules that describe the data.
Supervised learning models have some advantages over the unsupervised approach, but they also have limitations. The systems are more likely to make judgments that humans can relate to, for example, because humans have provided the basis for decisions. However, in the case of a retrieval-based method, supervised learning systems have trouble dealing with new information. If a system with categories for cars and trucks is presented with a bicycle, for example, it would have to be incorrectly lumped in one category or the other. If the AI system was generative, however, it may not know what the bicycle is but would be able to recognize it as belonging to a separate category.
An approach that combines both supervised and unsupervised techniques is called semi-supervised learning. This is when only some of the input data points are labeled with output information.
Georgia Tech profs explain supervised learning: