If you walk down the street shouting out the names of every object you see — garbage truck! bicyclist! sycamore tree! — most people would not conclude you are smart. But if you go through an obstacle course, and you show them how to navigate a series of challenges to get to the end unscathed, they would.
Most machine learning algorithms are shouting names in the street. They perform perceptive tasks that a person can do in under a second. But another kind of AI — deep reinforcement learning — is strategic. It learns how to take a series of actions in order to reach a goal. That’s powerful and smart — and it’s going to change a lot of industries.
Two industries on the cusp of AI transformations are manufacturing and supply chain. The ways we make and ship stuff are heavily dependent on groups of machines working together, and the efficiency and resiliency of those machines are the foundation of our economy and society. Without them, we can’t buy the basics we need to live and work.
Startups like Covariant, Ocado’s Kindred and Bright Machines are using machine learning and reinforcement learning to change how machines are controlled in factories and warehouses, solving inordinately difficult challenges such as getting robots to detect and pick up objects of various sizes and shapes out of bins, among others. They are attacking enormous markets: The industrial control and automation market was worth $152 billion last year, while logistics automation was valued at more than $50 billion.
Deep reinforcement learning consistently produces results that other machine learning and optimization tools are incapable of.
As a technologist, you need a lot of things to make deep reinforcement learning work. The first piece to think about is how you will get your deep reinforcement learning agent to practice the skills you want it to acquire. There are only two ways — with real data or through simulations. Each approach has its own challenge: Data must be collected and cleaned, while simulations must be built and validated.
Some examples will illustrate what this means. In 2016, GoogleX advertised its robotic “arm farms” — spaces filled with robot arms that were learning to grasp items and teach others how to do the same — which was one early way for a reinforcement learning algorithm to practice its moves in a real environment and measure the success of its actions. That feedback loop is necessary for a goal-oriented algorithm to learn: It must make sequential decisions and see where they lead.
In many situations, it is not feasible to build the physical environment where a reinforcement learning algorithm can learn. Let’s say you want to test different strategies for routing a fleet of thousands of trucks moving goods from many factories to many retail outlets. It would be very expensive to test all possible strategies, and those tests would not just cost money to run, but the failed runs would lead to many unhappy customers.
For many large systems, the only possible way to find the best action path is with simulation. In those situations, you must create a digital model of the physical system you want to understand in order to generate the data reinforcement learning needs. These models are called, alternately, digital twins, simulations and reinforcement-learning environments. They all essentially mean the same thing in manufacturing and supply chain applications.
Recreating any physical system requires domain experts who understand how the system works. This can be a problem for systems as small as a single fulfillment center for the simple reason that the people who built those systems may have left or died, and their successors have learned how to operate but not reconstruct them.
Many simulation software tools offer low-code interfaces that enable domain experts to create digital models of those physical systems. This is important, because domain expertise and software engineering skills often cannot be found in the same person.
Why would you go through all this trouble for a single algorithm? Because deep reinforcement learning consistently produces results that other machine learning and optimization tools are incapable of. DeepMind used it, of course, to beat the world champion of the board game of Go. Reinforcement learning was part of the algorithms that were integral to achieving breakthrough results with chess, protein folding and Atari games. Likewise, OpenAI trained deep reinforcement learning to beat the best human teams at Dota 2.
Just like deep artificial neural networks began to find business applications in the mid-2010s, after Geoffrey Hinton was hired by Google and Yann LeCun by Facebook, so too, deep reinforcement learning will have an increasing impact on industries. It will lead to quantum improvements in robotic automation and system control on the same order as we saw with Go. It will be the best we have, and by a long shot.
The consequence of those gains will be immense increases in efficiency and cost savings in manufacturing products and operating supply chains, leading to decreases in carbon emissions and worksite accidents. And, to be clear, the chokepoints and challenges of the physical world are all around us. Just in the last year, our societies have been hit by multiple supply chain disruptions due to COVID, lockdowns, the Suez Canal debacle and extreme weather events.
Zooming in on COVID, even after the vaccine was developed and approved, many countries have had trouble producing it and distributing it quickly. These are manufacturing and supply chain problems that involve situations we could not prepare for with historical data. They required simulations to predict what would happen, as well as how we could best address crises when they do occur, as Michael Lewis illustrated in his recent book “The Premonition.”
It is precisely this combination of constraints and novel challenges that take place in factories and supply chains that reinforcement learning and simulation can help us solve more quickly. And we are sure to face more of them in the future.