Enterprise IT Watch Blog

Feb 1 2018   12:49PM GMT

Instilling AI safety into robotics through reinforcement learning

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh

Tags:
Artificial intelligence
Robotics


Robotics image via FreeImages

By James Kobielus (@jameskobielus)

Artificial intelligence (AI) is the perfect laughingstock. Any phenomenon that takes itself as seriously as AI is just asking to be ridiculed.

What’s even funnier is when AI comes in humanoid form, as is the case with the smart robotics that are penetrating every aspect of our lives. As Bill Vorhies discussed in his recent column, robot fails can be comedic gold.

As the brains behind autonomous devices, AI can dampen the laughter only by helping devices master their assigned tasks so well and performing them so inconspicuously that we never give them a second thought. Where robotics are concerned, this involves the trial-and-error statistics-driven approach known as reinforcement learning (RL). Under this approach, the robot explores the full range of available actions—moving, grappling, voicing, etc.–that may or may not contribute to its achieving a desired outcome.

Depending on your point of view, humor is baked into RL’s intrinsically trial-and-error process. As a robot searches for the optimal sequence of actions to achieve its intended outcome, it will of necessity take more counterproductive actions than optimal paths. If you’re the developer who’s doing the training, this might be a long, frustrating, and tedious process. You may need to revise RL procedures and the robot’s algorithmic cognition countless times till you get it to work in a way that can be generalized to future scenarios of the type for which the mechanism is being trained.

This trial-and-error RL process may be humorous to observe in a laboratory setting. But when your AI-driven robot hasn’t been trained effectively and commits these errors in production environments, it may not be funny in the least. This is amply clear from the incidents that Vorhies cites. No one will tolerate robots that routinely smash into people, endanger passengers riding in autonomous vehicles, or order products online without their owners’ authorization.

If we can draw any lesson from these incidents, it’s that robotics developers will need to incorporate the following scenarios into their RL procedures before they release their AI-powered creations to the wider world:

  • Geospatial awareness: Real-world operating environments can be very tricky for general-purpose robots to navigate successfully. The right RL could have helped the AI algorithms in this security robot learn the range of locomotion challenges in the indoor and outdoor environments where it was designed to patrol. Equipping the robot with a built-in video camera and thermal imaging wasn’t enough. No amount of trained AI could salvage it after it had rolled over into a public fountain.
  • Collision avoidance: Robots can be a hazard as much as a helpmate in many real-world environments. This is obvious with autonomous vehicles, but it’s just as disturbing for retail, office, residential, and other environments in which people might let their guard down. As demonstrated in the cited article, there’s every reason for society to build AI-driven safeguards into everyday robots so that toddlers, the disabled, and the rest of us have no need to fear that they’ll crash into us when we least expect it. Collision avoidance—a prime RL challenge—should be a standard, highly accurate algorithm in every robot. Very likely, laws and regulators will demand this in most jurisdictions before long.
  • Authenticated agency: Robots are increasingly becoming the physical manifestations of digital agents in every aspect of our lives. The smart speakers mentioned in the cited article should have been trained to refrain from placing orders for what they mistakenly interpreted as voice-activated purchase requests, but which in fact came from a small child without parental authorization. Though this could have been handled through multifactor authentication rather than through algorithmic training, it’s clear that voice-activated robots in many environmental scenarios may need to step through complex algorithms when deciding what multifactor methods to use for strong authentication and delegated permissioning. Conceivably, RL might be used to help robots more rapidly identify the most appropriate authentication, authorization, and delegation procedures to use in environments where they serve as agents for many people trying to accomplish a diverse, dynamic range of tasks.
  • Defensive maneuvering: Robots are objects that must survive both deliberate and accidental assaults that other entities—such as human beings–may inflict on them. The AI algorithms in this driverless shuttle bus should have been trained to take some sort of evasive action—such as veering a few feet in the opposite direction–to avoid the semi that inadvertently backed up into it. Defensive maneuvering will become critical for robots that are deployed into transportation, public safety, and military roles. It’s also an essential capability for robotic devices to fend off the general mischief and vandalism that is certainly going to target them wherever they’re deployed.
  • Collaborative orchestration: Robots are increasingly deployed as orchestrated ensembles rather than isolated helpmates. The AI algorithms in warehouse robots in the cited article should have been trained to work harmoniously both with each other and with the many people employed in those environments. Given the huge range of potential interaction scenarios, this is a tough challenge for RL. But it’s an essential capability that society will be demanding from swarming devices of all sorts, including the drones that patrol our skies, deliver our goods, and explore environments that are too dangerous for humans to enter.
  • Cultural sensitivity: Robots must respect people in keeping with the norms of civilized society. That includes, as noted in this article, making sure that robots’ face recognition algorithms don’t make discriminatory, demeaning, or otherwise insensitive inferences about the human beings they encounter. This will become even more critical as we deploy robots into highly social setting where they must be trained not to offend people by, for example, using an inaccurate gender-based salutation to refer to a transgender person. These kinds of distinctions can be highly tricky for actual humans to make on the fly, but that only heightens the need for RL to train AI-driven entities to avoid automated faux pas.

Controlled trial-and-error is how most robotics, edge computing, and self-driving vehicle solutions will acquire and evolve their AI smarts. To the extent that you’re capturing an AI-driven device’s RL training on video, it could prove to be the perfect “blooper reel” to show later on when your creation is a smashing success. For regulatory compliance and legal discovery purposes, this video may also help you prove that you’ve RL-trained your device in every relevant scenario, be it actual or simulated.

In the near future, a video audit log of your RL process may become required for passing muster with stakeholders who require certifications that your creations meet all reasonable “AI safety” criteria. Considering the life-or-death scenarios in which the robots of the future will serve us, this is no laughing matter.

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: