Robotics image via FreeImages
By James Kobielus (@jameskobielus)
Training is the foundation of data-driven smarts. The conversational intelligence of virtual digital assistants—aka chatbots—depends on the extent to which their statistical algorithms have been trained with the most relevant, high-quality data for the task at hand.
Without frequent retraining on fresh data, even the most expertly scripted chatbot will behave like a clueless dummy. Fortunately for chatbot developers, training resources are amply available for building and tuning the smarts of your AI-driven digital assistants 24×7. If you’re building these bots into your mobile, social, e-commerce, Internet of Things, and other apps, here are the training options you should explore:
- Build your bot a pre-trained brain: You can kickstart your chatbot development by leveraging pre-built third-party chatbot models that have been pretrained on labeled data. Many digital assistants have already been built on open-source chatbot software (such as this, this, this, and these) and trained with open-source training data (such as those listed here and here).
- Spin up a crowdsourced chatbot training service: You may lack resources to label your chatbot training data at the volume, velocity, and variety that are required. So you may want to engage a third-party crowdsourcing service, such as Amazon Mechanical Turk, CrowdFlower, or Mighty AI to do it for you.
- Tap into your chatbot’s organic stream of training data: Chatbots support conversational, gestural, visual, auditory, and other interfaces that invite users to generate training data in normal operations. Depending on the devices and apps to with which they’re configured, chatbots may also generate video, speech, image, emoji, sensor, geospatial, and other rich data that can be used to train convolutional, recurrent, and other deep neural networks. In addition, the ongoing interactions between edge-embedded chatbot apps and cloud services generate a rich stream of dynamically contextualized interaction data that can be used to retrain bots in real-time. The user-generated data in these streams can be used to label the data so that chatbot algorithms can be dynamically retrained to improve their fitness to the designated learning task.
- Transplant the best-trained minds of kindred chatbots into yours: As these AI-driven apps are built for a wider range of use cases, there will be an expanding pool of chatbot artifacts for jumpstarting your next digital-assistant project. If your planned chatbot’s domain is sufficiently similar to those of one or more previously deployed bots, you may use transfer learning to assess which “statistical knowledge” from prior deployments may be transplanted into the next. This is not the same as simply repurposing a pre-existing chatbot in its entirety for your next app. It involves assessing which prior chatbot development artifacts—such as training data, feature models, neural-node layering architectures, training methods, loss functions, and learning rates—may be reusable in various combinations. Check out Facebook’s recently announced ParlAI framework, which provides tools and a library for accelerating transfer learning across chatbot development projects.
Of course, composing chatbots is as much of a conversational art—akin to screenwriting or ventriloquism—as it is a data science. To do it well, developers need to ensure that all this technical wizardry is concealed by a seemingly simple, natural, friendly, fun, and useful interface.
In training your e-commerce chatbot, for example, are you ensuring that the algorithm whose loss function you’re minimizing actually delivers user satisfaction in deciding what to buy, when to buy it, at what price, and under what circumstances? This requires that, when building and tuning the algorithms that drive all this magic, you’re somehow able to train for how well a chatbot’s AI-generated “personality” meshes with each user’s own organic personality.
And that, in turn, requires that you somehow be able to train algorithms to master the fiendishly complex human capacity for chitchat. Natural language of any sort is extraordinarily complex (semantics, syntax, grammar, usage, etc.). But the chitchat variety is even more so. It often comes out of nowhere, flows unpredictably across every conceivable topic at every level, and may just as randomly dissolve, to be forgotten forever or, without warning, picked up as a remembered discussion thread the next time the conversants re-engage. Typically, chitchat is the unstructured opposite of the structurable question-answering dialogues that the likes of Watson were built to support.
From a data science standpoint, training chatbots to emulate dialogue naturalism is an approach that goes beyond merely building rule-driven conversational scripts and a deep lexicon into the software. It requires that you do the following:
- Implement a curated corpus of chatbot training data: Developers should train chatbots from a deep, constantly refreshing, and intensively curated semantic corpus of worldly experience as expressed in natural language, engaging in ongoing A/B testing and real-world experiments to test which design elements, including algorithms, best achieve their intended outcomes.
- Build conversational frames for managing chatbot training data: Developers should contextualize chatbot training data within the entire real-time, historical, and predictive frame in which chatbots engage in their dialogues with users. This will require data-engineering tools and techniques for building these contexts into metadata within which this data is persisted in Hadoop, NoSQL, and other data platforms.
- Model chatbot training data at the appropriate dimensional level: Developers should be prepared to prepare chatbot training data that can optimize dialogues that take place in increasingly high-dimensional feature spaces. This befits conversational frames that involve a growing range of unstructured data objects (streaming media, photographic images, aggregated environmental feeds, rich behavioral data, and geospatial intelligence), diverse practical subtleties (linguistic, affective, social, behavioral, etc.), staggeringly complex situational variables (randomness, vagueness, ambiguity, expectations, etc.), and an endless stream of user sensitivities (e.g., cultural affronts, frequent interruptions, overnotifications, irrelevant messages, odd chatbot voiced accents, etc.).
For a larger discussion of how you acquire and prepare training data for chatbots and other AI projects, check out my recent KDNuggets column here.
Cisco image via FreeImages
Do you believe CCIE and CCDE certifications still have value? Check out the details behind Cisco’s latest recertification program in this week’s roundup.
1. Cisco recertification program gets flexible, but cost is high – Antone Gonsalves (SearchNetworking)
The latest Cisco recertification program adds continuing education as an option to keeping the CCIE and CCDE current. But the cost of courses will make the offering expensive.
2. Researchers port EternalBlue exploit to Windows 10 – Michael Heller (SearchSecurity)
The EternalBlue exploit behind the WannaCry ransomware attacks has been successfully ported to an older version of Windows 10, but newer versions of the OS are protected.
3. New edition of Windows 10 to take on big data – Ramin Edmond (SearchEnterpriseDesktop)
Microsoft revealed plans to release a new edition of Windows 10 that can optimize PCs to deal with intensive data workloads. That could be a big help to companies that rely on machine learning data and more.
4. NHL puts 100 years of hockey history into SAP HANA database – Jim O’Donnell (SearchSAP)
The National Hockey League is using the SAP HANA database on HANA Enterprise Cloud to power a statistics site that encompasses the entire league history.
5. Quorum onQ appliance gives ransomware recovery a ‘shot’ – Paul Crocetti (SearchDisasterRecovery)
Using snapshots and server-level restores, the new Quorum recovery appliance is another layer of protection and helps organizations rebuild minutes following a ransomware attack.
HPE image via FreeImages
How do you see the future of HPE shaping up? Check out the areas the company will be focusing on going forward in this week’s roundup.
1. Future of HPE sits at the edge with IoT in a post-cloud world – Robert Gates (SearchDataCenter)
Hewlett Packard Enterprise will focus on areas where the public cloud can’t get the job done, such as multicloud technology and at the edge with internet of things projects.
2. Microsoft makes Skype messaging more competitive – Antone Gonsalves (SearchUnifiedCommunications)
The latest Microsoft Skype messaging features bring needed improvements to the consumer app. But analysts say Skype for Business is unlikely to get the same attention this year.
3. GDPR breach notification rule could complicate compliance – Peter Loshin (SearchSecurity)
Don’t forget the huge fines: When it comes to the new 72-hour GDPR breach notification rule, the cost of compliance must be weighed against harsh GDPR penalties.
4. More VMware analytics coming to EUC products, CTO says – Eddie Lockhart (SearchVirtualDesktop)
The Citrix vs. VMware rivalry is expanding into analytics. VMware’s CTO of end-user computing says analytics will help IT better secure applications and data.
5. Veeam management changes accompany product transformation – Paul Crocetti (SearchDataBackup)
Veeam, once focused primarily on virtual machines, widens supports for other platforms as the company’s revenue increases and management team transforms.
Cyber image via FreeImages
How will the federal government combat cybersecurity gaps? Check out what the ‘cyber czar’ said at a recent event in this week’s roundup.
1. Cyber czar says government will manage IT like an enterprise does – Nicole Laskowski (SearchCIO)
Trump administration ‘cyber czar’ Rob Joyce outlined how the government plans to combat cybersecurity gaps at an event in Boston.
2. Samba vulnerability brings WannaCry fears to Linux/Unix – Michael Heller (SearchSecurity)
A widespread Samba vulnerability has raised the possibility of attacks similar to WannaCry hitting Linux and Unix systems, but mitigation options are available.
3. 8×8, RingCentral tie for top spot in UCaaS providers report – Katherine Finnell (SearchUnifiedCommunications)
In UC news, 8×8 and RingCentral tie as leading UCaaS providers in an IHS Markit report, while RingCentral announces a new webinar service that supports up to 3,000 attendees.
4. Google IoT service checks box years after AWS, Azure – Trevor Jones (SearchCloudComputing)
Google’s new IoT service plays catch up with AWS, Azure and others, but its analytics prowess could help close the gap in a nascent market.
5. Citrix Analytics Service targets IT security market with AI – Ramin Edmond (SearchEnterpriseDesktop)
IT pros need more visibility into what users are doing, and Citrix’s new analytics service aims to provide just that.
Virus image via FreeImages
What do you think about how Microsoft remediated the EternalBlue vulnerability? Find out why experts believe it was poorly handled in this week’s roundup.
1. Vulnerability remediation of WannaCry flaw raises concerns – Michael Heller (SearchSecurity)
Between patch delays and NSA disclosure issues, experts said the vulnerability remediation for WannaCry was poorly handled and caused more damage.
2. Toshiba deal reflects Mitel’s’ UCC market strategy – Luke O’Neill (SearchUnifiedCommunications)
As it acquires Toshiba’s unified communications assets, Mitel says it is keenly focused on the UCC market, expanding its customer base and moving organizations to the cloud.
3. IT has eye on Citrix Cloud, Microsoft at Synergy – Ramin Edmond (SearchVirtualDesktop)
IT experts can’t wait to see what Citrix has in store for them at Synergy next week in Orlando, Fla. News about Citrix Cloud and the Microsoft relationship are top of mind.
4. Red Hat exec talks open source strategies, innovation and VMware – Fred Churchville (SearchMicroservices)
In this Q&A, Red Hat’s Craig Muzilla explains the ideas behind the company’s open source strategy, the value of the approach and what happened to virtual machines.
5. WannaCry ransomware attack shows value of data backups – Sonia Lelii (SearchDataBackup)
WannaCry and other ransomware attacks can be thwarted, but it takes proper data protection practices beforehand, as well as close monitoring of your data.
White House image via FreeImages
What do you make of President Trump’s cyber executive order? Find out why it’s receiving mixed reviews in this week’s roundup.
1. Trump cyber executive order focuses on cyber-risk management – Michael Hell (SearchSecurity)
The Trump cyber executive order arrived with a focus on cyber-risk management and reports, but key details missing in terms of implementing changes.
2. Windows 10 Fall Creators Update loaded with cross-platform features – Ramin Edmond (SearchEnterpriseDesktop)
New features in the upcoming Windows 10 Fall Creators Update, announced this week at Microsoft Build, will help developers to create more consistent cross-platform apps and users to work better between different devices.
3. Containers, cloud, fast networking and IoT still top IT trends – Stephen J. Bigelow (SearchDataCenter)
From containers and faster networking to IoT and edge computing, these trends continue to define the role of IT pros in future data centers and the skill sets they need to hone.
4. Jobs in data science may seem glamorous, but they require dirty work – Ed Burns (SearchBusinessAnalytics)
The role of a data scientist is often seen as one of today’s most glamorous and exciting jobs, but behind the glitz and acclaim are a lot of toil and hard work.
5. Dell EMC World 2017: HCI, startup investments, Nautilus – Dave Raffo (SearchStorage)
This year’s Dell EMC World served as a coming-out party for Dell Technologies Capital, ‘Project Nautilus’ and IoT storage, and as a showcase for current products.
Flash storage via FreeImages
What should we expect from Dell EMC World 2017? Find out why flash will take center stage in this week’s roundup.
1. Dell EMC World 2017: Flash, HCI and the cloud take center stage – Dave Raffo (SearchStorage)
This year’s Dell EMC World will feature plenty of product releases as the mega-vendor beefs up emerging technologies while continuing its pre-merger platforms.
2. NATO cyberwar games show the U.S. needs more practice – Michael Heller (SearchSecurity)
The NATO Locked Shields cyberwar games had the U.S. team winning most improved, but experts say the U.S. still needs more practice.
3. Viptela the latest Cisco acquisition in $610M deal – Chuck Moozakis (SearchSDN)
SD-WAN vendor Viptela is the latest Cisco acquisition, as the networking vendor takes steps to fortify its cloud services portfolio.
4. Next-gen technologies displayed at SAP Innovation Center – Jim O’Donnell (SearchSAP)
The new SAP Next-Gen program and innovation center opened with a look at SAP’s vision for next-gen technologies, including machine learning, blockchain, IoT, VR and 3D printing.
5. Signs point to cloud future at Red Hat Summit 2017 – Jason Sparapani (SearchCIO)
The open source software company is transitioning from its Linux roots to cloud services.
CEO image via FreeImages
Is Citrix exploring a sale? Find out what its CEO said in this week’s roundup.
1. Citrix CEO addresses sale ‘rumors’ – Ramin Edmond, Alyssa Provazza and Colin Steele (SearchVirtualDesktop)
In a sit-down interview, Citrix CEO Kirill Tatarinov reacts to reports that the company is exploring a sale. And he discusses the effects of recent acquisitions and partnerships.
2. AWS promises to be GDPR compliant by May 2018 deadline – Peter Loshin (SearchSecurity)
Amazon promises all AWS cloud services will be GDPR compliant before enforcement of the new EU data privacy regulation starts in 2018, offers customers assistance.
3. Changes to Microsoft Office licenses rub IT the wrong way – Ramin Edmond (SearchEnterpriseDesktop)
Business versions of Skype, OneDrive and Outlook won’t be part of on-premises Microsoft Office licenses, leaving those shops looking for alternatives.
4. Flash storage market shifting, Kaminario CEO says – Carol Sliwa (SearchSolidStateStorage)
Dani Golan of Kaminario sees NVMe as the next big disruptor in the flash market; says Kaminario focuses on cloud and SaaS, while legacy vendors fight over shrinking IT budgets.
5. DevSecOps, or how to build safer software so much faster – Valerie Silverthorne (SearchSoftwareQuality)
DevOps can help develop software faster, but that’s not making it any safer. DevSecOps is an effort to bring security into the mix. Here are some ways to get started.
Windows image via FreeImages
Do you approve of the Windows 10 Creators Update? Find out why many IT professionals were disappointed in this week’s roundup.
1. Windows 10 Creators Update features fall short for IT – Ramin Edmond (SearchEnterpriseDesktop)
IT pros hoped for more advanced security tools and other new Windows 10 Creators Update features, but they were underwhelmed by what they actually received.
2. IT shops map journey from VMware Cloud Foundation to IBM Cloud, AWS – Robert Gates (SearchDataCenter)
More than 1,000 users have moved their VMware infrastructure to IBM’s public cloud using Cloud Foundation, while others await a much publicized AWS partnership to go GA.
3. Stuxnet worm flaw still the most exploited after seven years – Michael Heller (SearchSecurity)
Security researchers say the vulnerability behind the infamous Stuxnet worm is still the most exploited in the world, seven years after being patched.
4. ONUG Spring 2017 conference issues include barriers to cloud adoption – Jennifer English (SearchSDN)
The ONUG Spring 2017 conference includes sessions with market-leading cloud providers, like Amazon and Microsoft, addressing barriers to enterprise cloud adoption.
5. SQL Server 2017 makes Python a first-class citizen for analytics – Jack Vaughan (SearchSQLServer)
Python is no outsider at Microsoft. It will ride with R as the company’s SQL Server 2017 platform moves to its second CTP. Analytics is a big part of what is new.
Money image via FreeImages
Last week, Gartner revised its global IT spending forecast for 2017. Check out the revisions in this week’s roundup.
1. Gartner cuts 2017 IT spending forecast, sees shift to ‘digital business’ – Mekhala Roy (SearchCIO)
As companies embrace their digital future, the shift in IT spending to software plows on.
2. Hyper-converged systems will underpin hybrid cloud, says Dell EMC exec – Robert Gates (SearchDataCenter)
Enterprises will increasingly fill data centers with rack-scale hyper-converged infrastructure as the basis for hybrid clouds, says the CTO of Dell EMC’s converged platforms division.
3. IBM AIX users look forward to a cloudy future – Ed Scannell (SearchCloudComputing)
IT organizations that rely on AIX-based applications still whirring away may feel the cloud computing age is passing them by — but help is on the way.
4. Microsoft unveils update guide, fixes Hyper-V on April Patch Tuesday – Dan Cagen (SearchWindowsServer)
Microsoft dropped its security bulletin format on April Patch Tuesday and switched to its Security Update Guide. Windows Server admins should be aware of a Hyper-V host patch.
5. U.S election hacking not an act of cyberwarfare, experts say – Michael Heller (SearchCloudSecurity)
The government needs a better definition for an act of cyberwarfare, says ex-CIA Director Michael Hayden, because he doesn’t think the U.S. election hacking applies.