blobbotronic - stock.adobe.com

Reinforcement learning applications provide focused models

Goal-driven AI uses trial-and-error learning methods to find optimal solutions to enterprise problems, while distancing themselves from requiring human maintenance.

Ronald Schmelzer, Cognilytica

Published: 16 May 2019

A common measure of machine intelligence is challenging AI to play complex games against humans. The first AI programs tackled checkers and progressed to beat human players at chess, Go and a wide range of multiplayer games. The thinking behind reinforcement learning (RL) is that if a computer can outwit humans by thinking, planning ahead and predicting human behavior, then the machines have the capacity to learn anything. Now, researchers are still studying how computers learn through iteration and trial and error.

One of the simplest goal-driven problems that computers were first tasked with was trying to find the right path through a maze. In this goal-driven exercise, there is only one optimal solution or a single path from start to finish. From there, checkers proved to be more complex with a number of different solutions but still within a fairly easily computable universe. Trying to solve chess posed greater problems for early AI systems because of an enormous potential solution universe. It took the computing power of IBM's Deep Blue to get the victory over Grandmaster Garry Kasparov in 1996. Once considered a feat impossible for computers, Google's DeepMind beat humans at Go, a game with almost infinite possibilities. Now, Google's DeepMind division has set its sights on creating a computer program to compete against teams of humans in massive multiplayer games.

Despite the training, getting machines to play games at superhuman levels is not the end goal of these intelligent systems. Goal-driven systems address any scenario where the enterprise needs to achieve an outcome within the general rules of a system. As such, enterprises are starting to see profits from the use of RL to solve goal-driven problems and find the optimal solution among a sea of almost infinite possibilities.

Finding the optimal solution through trial and error

Enterprise optimization challenges can be like mazes with a single path to be discovered from start to finish. Other problems need an overall strategy for winning among a huge universe of possibilities or limitless options and many independently acting parties that must be coordinated to solve the problem. AI vendors and researchers are finding a rich field of opportunity applying RL and goal-driven systems to problems in the enterprise.

Process automation has been making significant waves in the vendor ecosystem, with companies raising significant funds and generating substantial revenue by providing automated software bots that solve workflow challenges in the enterprise. However, like their factory automation brethren, these software bots are limited by a lack of training for the inevitable variability in data, processes and systems. The use of RL is changing this by introducing the idea of autonomous business process systems. Like finding the solution to a maze, there is an optimal process flow for invoicing a customer and receiving payments or dealing with an IT support ticket request. Instead of a human manually defining and redefining the process flows and steps, RL systems discover the optimal flow for the process and then constantly iterate on that flow without any human interaction.

In other use cases, RL systems are being tasked with finding optimal usage of resources in the enterprise -- from allocations of servers and storage in data centers to distribution of employees, contractors and finances. Just like in games of chess, RL systems can find the best way to deploy resources based on the current situation and what it has learned from previous examples for a situational "victory." In the near future, resource optimization in the enterprise won't be something based on outdated notions of rules, human-defined process flows or settings in a database, but rather will be powered by machine learning systems that are constantly working to find the best solution. Enterprises will find that they need to use machine learning approaches just to keep up with their competition.

Other applications of RL approaches include determining solutions to traffic congestion, robots that have figured out for themselves how to run, jump, move and navigate their environment, applications in chemistry to optimize chemical reactions, and pharmaceutical and life sciences applications for drug discovery and protein folding.

Scenario simulation

Another place where RL and goal-driven AI systems are proving their value is in scenario simulations. There are many situations in which organizations don't want to iterate with actual resources to find the best solution, and simulating the environment using RL can help find the best solution for a given problem. For example, companies are using RL-based, goal-driven systems to suggest the best marketing or sales approach for a complicated market, or suggest the best way to position airplanes in advance of a storm without having to first move those planes.

The most notable example of machine learning-powered scenario simulation is roboadvising, which is rapidly becoming a powerful tool for money management firms. Instead of a human money manager trying to find the best allocation of savings, investments and payoff of debts to achieve a particular financial goal, AI systems run thousands of simulations that can find the best solution. These roboadvisors provide highly personalized recommendations using the specific realities of someone's assets, investments and spending patterns, rather than grouping these individuals into generalized buckets. By reducing or eliminating human financial advisers, customers also save a significant amount on management fees. The typical roboadvisor charges a flat fee of 0.2% to 0.5% of assets under management, as opposed to the typical rate of 1% to 2% charged by human financial planners.

Goal-driven AI systems are being applied to fast-paced situations to find optimal strategies to bid on auctions or make financial market trades. Operating first offline to refine their RL model, these systems are then being used in real-world situations to make trades that might seem unusual in the short term, but can offer a big potential reward in the long term.

Is RL the one algorithm to rule them all?

Proponents of RL and goal-driven approaches to machine learning claim that you can learn almost anything through trial-and-error based approaches. Machine learning has proven to be remarkably good at discovering "hidden rules" of games and other environments and beating even the best humans at their own games.

Whereas supervised learning approaches learn through clean, well-labeled data, RL can start from a blank slate, knowing very little about the environment, and succeed. Similarly, unsupervised learning approaches can discover patterns and structure of data, but can't do much else with that learning to address new environmental situations. Researchers are currently working on enabling RL and goal-driven approaches to leverage learning from one environment to other, or to new environments with minor alterations. This idea of adaptability is the cornerstone of artificial general intelligence (AGI) and the quest for the one algorithm that can learn anything.

Certainly DeepMind and others pursuing RL have shown that it is a powerful approach to machine learning. However, it remains to be seen if all AI can be taught to succeed through RL training. In the short term, RL and goal-driven approaches to machine learning are proving their value in the enterprise and in a wide range of operational environments.

Next Steps

Spotify personalizes audio experiences with machine learning

Reinforcement learning applications provide focused models

Goal-driven AI uses trial-and-error learning methods to find optimal solutions to enterprise problems, while distancing themselves from requiring human maintenance.

Finding the optimal solution through trial and error

Scenario simulation

Is RL the one algorithm to rule them all?

Next Steps

Dig Deeper on Machine learning platforms

password entropy

What is machine learning and how does it work? In-depth guide

PyTorch

New HPE ProLiant servers with Arm target energy savings

Finding the optimal solution through trial and error

Scenario simulation

Is RL the one algorithm to rule them all?

Next Steps

Related Resources

Dig Deeper on Machine learning platforms

password entropy

What is machine learning and how does it work? In-depth guide

PyTorch

New HPE ProLiant servers with Arm target energy savings