Reinforcement learning is a training method based on rewarding desired behaviors and/or punishing undesired ones. The learning method has been adopted in artificial intelligence (AI) as a method of directing unsupervised machine learning through rewards and penalties. Reinforcement learning is used in operations research, information theory, game theory, control theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms.
Where supervised learning algorithms are typically trained with a body of known correct answers, an agent learning by reinforcement is not. A reinforcement learning agent learns from the environment where it performs its task. First, a method of rewarding desired behaviors and punishing negative behaviors is devised. Positive values are assigned to desired behaviors to provide positive reinforcement and negative values to undesired behaviors for negative reinforcement.
The agent is programmed to seek long-term and maximum overall reward to achieve an optimal solution. Long-term goals help prevent the agent from stalling on lesser goals while avoiding risk. Also of note is the addition of mechanisms to encourage exploration. Markov decision processes are sometimes used in exploration decisions where an agent might ignore a reward in order to explore; to that end, developers might add an effect, like curiosity, that aids in making discoveries.
A learning algorithm playing Pac Man might have the ability to move in one of four possible directions, barring obstruction. From pixel data an agent might be given a numeric reward for the result of a unit of travel: 0 for empty space, 1 for pellets, 2 for fruit, 3 for a power pellet, 4 for a ghost post-power pellet, 5 for collecting all pellets and completing a level but being deducted 5 points for collision with a ghost. The agent starts from randomized play to sophisticated, learning the goal of getting all pellets to complete the level. Given time, an agent might even learn tactics like conserving power pellets till needed for self-defense.
Because it’s based on an understanding of biological systems, reinforcement learning is a part of bio-inspired computing. As a psychological principle, reinforcement learning hails from the school of behavioral psychology.