Enterprise IT Watch Blog

Jun 15 2017   9:50AM GMT

Training the next generation of AI-driven chatbots

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh

Artificial intelligence

Robotics image via FreeImages

By James Kobielus (@jameskobielus)

Training is the foundation of data-driven smarts. The conversational intelligence of virtual digital assistants—aka chatbots—depends on the extent to which their statistical algorithms have been trained with the most relevant, high-quality data for the task at hand.

Without frequent retraining on fresh data, even the most expertly scripted chatbot will behave like a clueless dummy. Fortunately for chatbot developers, training resources are amply available for building and tuning the smarts of your AI-driven digital assistants 24×7. If you’re building these bots into your mobile, social, e-commerce, Internet of Things, and other apps, here are the training options you should explore:

  • Build your bot a pre-trained brain: You can kickstart your chatbot development by leveraging pre-built third-party chatbot models that have been pretrained on labeled data. Many digital assistants have already been built on open-source chatbot software (such as this, this, this, and these) and trained with open-source training data (such as those listed here and here).
  • Spin up a crowdsourced chatbot training service: You may lack resources to label your chatbot training data at the volume, velocity, and variety that are required. So you may want to engage a third-party crowdsourcing service, such as Amazon Mechanical Turk, CrowdFlower, or Mighty AI to do it for you.
  • Tap into your chatbot’s organic stream of training data: Chatbots support conversational, gestural, visual, auditory, and other interfaces that invite users to generate training data in normal operations. Depending on the devices and apps to with which they’re configured, chatbots may also generate video, speech, image, emoji, sensor, geospatial, and other rich data that can be used to train convolutional, recurrent, and other deep neural networks. In addition, the ongoing interactions between edge-embedded chatbot apps and cloud services generate a rich stream of dynamically contextualized interaction data that can be used to retrain bots in real-time. The user-generated data in these streams can be used to label the data so that chatbot algorithms can be dynamically retrained to improve their fitness to the designated learning task.
  • Transplant the best-trained minds of kindred chatbots into yours: As these AI-driven apps are built for a wider range of use cases, there will be an expanding pool of chatbot artifacts for jumpstarting your next digital-assistant project. If your planned chatbot’s domain is sufficiently similar to those of one or more previously deployed bots, you may use transfer learning to assess which “statistical knowledge” from prior deployments may be transplanted into the next. This is not the same as simply repurposing a pre-existing chatbot in its entirety for your next app. It involves assessing which prior chatbot development artifacts—such as training data, feature models, neural-node layering architectures, training methods, loss functions, and learning rates—may be reusable in various combinations. Check out Facebook’s recently announced ParlAI framework, which provides tools and a library for accelerating transfer learning across chatbot development projects.

Of course, composing chatbots is as much of a conversational art—akin to screenwriting or ventriloquism—as it is a data science. To do it well, developers need to ensure that all this technical wizardry is concealed by a seemingly simple, natural, friendly, fun, and useful interface.

In training your e-commerce chatbot, for example, are you ensuring that the algorithm whose loss function you’re minimizing actually delivers user satisfaction in deciding what to buy, when to buy it, at what price, and under what circumstances? This requires that, when building and tuning the algorithms that drive all this magic, you’re somehow able to train for how well a chatbot’s AI-generated “personality” meshes with each user’s own organic personality.

And that, in turn, requires that you somehow be able to train algorithms to master the fiendishly complex human capacity for chitchat. Natural language of any sort is extraordinarily complex (semantics, syntax, grammar, usage, etc.). But the chitchat variety is even more so. It often comes out of nowhere, flows unpredictably across every conceivable topic at every level, and may just as randomly dissolve, to be forgotten forever or, without warning, picked up as a remembered discussion thread the next time the conversants re-engage. Typically, chitchat is the unstructured opposite of the structurable question-answering dialogues that the likes of Watson were built to support.

From a data science standpoint, training chatbots to emulate dialogue naturalism is an approach that goes beyond merely building rule-driven conversational scripts and a deep lexicon into the software. It requires that you do the following:

  • Implement a curated corpus of chatbot training data: Developers should train chatbots from a deep, constantly refreshing, and intensively curated semantic corpus of worldly experience as expressed in natural language, engaging in ongoing A/B testing and real-world experiments to test which design elements, including algorithms, best achieve their intended outcomes.
  • Build conversational frames for managing chatbot training data: Developers should contextualize chatbot training data within the entire real-time, historical, and predictive frame in which chatbots engage in their dialogues with users. This will require data-engineering tools and techniques for building these contexts into metadata within which this data is persisted in Hadoop, NoSQL, and other data platforms.
  • Model chatbot training data at the appropriate dimensional level: Developers should be prepared to prepare chatbot training data that can optimize dialogues that take place in increasingly high-dimensional feature spaces. This befits conversational frames that involve a growing range of unstructured data objects (streaming media, photographic images, aggregated environmental feeds, rich behavioral data, and geospatial intelligence), diverse practical subtleties (linguistic, affective, social, behavioral, etc.), staggeringly complex situational variables (randomness, vagueness, ambiguity, expectations, etc.), and an endless stream of user sensitivities (e.g., cultural affronts, frequent interruptions, overnotifications, irrelevant messages, odd chatbot voiced accents, etc.).

For a larger discussion of how you acquire and prepare training data for chatbots and other AI projects, check out my recent KDNuggets column here.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: