 




<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Enterprise IT Consultant Views on Technologies and Trends &#187; unstructured data</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/tag/unstructured-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends</link>
	<description>Everything from Mainframes to Cloud</description>
	<lastBuildDate>Fri, 10 May 2013 20:03:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Websense Data Leak Prevention expected to gain more traction in Enterprises</title>
		<link>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/websense-data-leak-prevention-expected-to-gain-more-traction-in-enterprises/</link>
		<comments>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/websense-data-leak-prevention-expected-to-gain-more-traction-in-enterprises/#comments</comments>
		<pubDate>Mon, 16 May 2011 09:31:41 +0000</pubDate>
		<dc:creator>Sasirekha R</dc:creator>
				<category><![CDATA[DLP]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[Forrester]]></category>
		<category><![CDATA[Gartner]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[unstructured data]]></category>
		<category><![CDATA[WebSense]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/websense-data-leak-prevention-expected-to-gain-more-traction-in-enterprises/</guid>
		<description><![CDATA[Websense Data Leak Prevention expected to gain more traction in Enterprises According to security experts, it costs organizations several million dollars per incident of data loss or theft. Even if the incident per se is innocent &#8211; like employee sending customer data to their personal emails so that they can work from home, the repercussions [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Websense Data Leak Prevention expected to gain more traction in Enterprises</strong></p>
<p>According to security experts, it costs organizations several million dollars per incident of data loss or theft. Even if the incident per se is innocent &#8211; like employee sending customer data to their personal emails so that they can work from home, the repercussions could be significant. In addition to direct loss in terms of money, any data leakage or loss results in negative publicity and putting the organization&#8217;s reputation at stake.</p>
<p>As reported by the Ponemon Institute, the average per-person expense for a data breach in 2010 was $214, with the average organizational cost at $7.2 million. The cost factors include detection, escalation, notification, response, and lost business feature. It is expected that this cost will continue to rise and especially so with the new regulations in almost all domains.</p>
<p>Content-aware data loss prevention (DLP) solutions enable organizations to enforce effective business practices for storing and transmitting sensitive data and avoid data loss / leakage. <span id="more-313"></span>According to Forrester, Financial Information, Non-public personal information, Personal Health information and Intellectual property are the four areas where enterprises are implementing the content-aware DLP solutions.</p>
<p>Content-aware DLP solutions differ from mere authorization tools that controls access based on roles or other rules, in their ability to classify information contained in an object (email, file, packet, database etc.). DLP products can inspect information while they are in storage, in use or in transit and can intercept multiple channels (email, HTTP, FTP, file shares, printers, USB, portable media, databases, IMs).</p>
<p>Websense that offers data loss prevention solution is identified as a leader by both Gartner and Forrester continually for past 3 to 4 years in providing <em>unified Web, data and email content security</em><em> solutions. </em>Websense goal is to protect essential information &#8211; such as financial info, databases, and employee records &#8211; in all locations, including email, websites, PCs, laptops, USB drives, and printers.</p>
<p>Websense with its simple interface can be installed in a few hours and makes data security simple by providing one unified console, over 1000 built-in policy rules and enabling DLP in Web or email with just a mouse click.</p>
<p>Websense support creation of policies that results in classifying, tagging, encrypting, alerting, reporting, logging etc. which come into effect automatically in various events. A simple policy that alerts and prevents the employees from forwarding customer data to any of their personal emails could save the organization from data leak. Typically these DLP solutions are intentionally visible (unlike firewalls) and this can be used to educate the employees about the inappropriateness of their action. In one of its case study, Websense highlights that this employee education on policy violation resulted in an immediate 50% decline in alerts.</p>
<p>WebSense provides DLP solutions in cloud as well as Saas. It offers both subscription pricing as well as perpetual licensing. Gartner points out that Websense offer comprehensive capabilities in all three functional areas &#8211; network, discovery and endpoint.</p>
<p>Websense uses its patented <a href="http://www.websense.com/content/PreciseID.aspx">PreciseID technology</a> to conduct 24&#215;7 deep analysis of all content &#8211; web, email, data and applications &#8211; in real time. When anything suspicious pops-up, PreciseID in addition to alerting, isolates the would-be invader and prevents a zero-hour attack (that tries to exploit the vulnerabilities before the developers become aware of it).</p>
<p><a href="http://www.websense.com/content/ThreatSeeker.aspx">ThreatSeeker Network</a> provides the intelligence that underlies Essential Information Protection by delivering real-time reputation analysis and expanded behavioral analysis. Websense augmented its ThreatSeeker technology with organically developed and acquired email security, hosted security, and data loss prevention technologies from SurfControl and Port Authority. Websense then added dedicated content and email security specialists to the Websense Security Labs team of researchers. The result is a network of technology and human intelligence that creates an adaptive feedback network that uses more than 50 million real-time data collecting systems to parse one billion pieces of content daily.</p>
<p>PreciseID works with the <a href="http://www.websense.com/content/ThreatSeeker.aspx">Websense ThreatSeeker Network</a> to deliver deep content control, which enables Websense Data Security Suite to accurately secure confidential data, efficiently prevent information leaks, and ultimately protect who and what go where and how.</p>
<p><a href="http://www.websense.com/content/DataSecuritySuite.aspx">Websense Data Security Suite</a> enables:</p>
<ul>
<li>to enforce business and regulatory policies across multiple channels of communication;</li>
<li>craft policies from pre-built templates to help adherence to specific regulations;</li>
<li>inform employees of actions that can lead to regulatory issues; and</li>
<li>document efforts to help demonstrating regulatory compliance with simple management and reporting.</li>
</ul>
<p>While Websense doesn&#8217;t yet have the visibility like Symantec and McAfee, it matches Symantec&#8217;s DLP solutions nearly feature-for-feature, at a much lower price. According to Forrester, most enterprises want &#8220;DLP express&#8221; products to help solve regulatory and toxic data problems without complex integration challenges or high prices. And Websense is the vendor best positioned to cross the chasm into the mass market.</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/websense-data-leak-prevention-expected-to-gain-more-traction-in-enterprises/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Autonomy hailed as Market Leader in Message Archiving</title>
		<link>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/autonomy-hailed-as-market-leader-in-message-archiving/</link>
		<comments>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/autonomy-hailed-as-market-leader-in-message-archiving/#comments</comments>
		<pubDate>Mon, 16 May 2011 05:00:40 +0000</pubDate>
		<dc:creator>Sasirekha R</dc:creator>
				<category><![CDATA[Appliance]]></category>
		<category><![CDATA[Archive]]></category>
		<category><![CDATA[Autonomy]]></category>
		<category><![CDATA[cost saving]]></category>
		<category><![CDATA[Forrester]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[IT Appliances]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[unstructured data]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/autonomy-hailed-as-market-leader-in-message-archiving/</guid>
		<description><![CDATA[Autonomy Consolidated Archive hailed as Market Leader Human-friendly information in the forms of documents, web pages, presentations, videos, emails, phone conversations and IMs now form around 80% of the information available and their volume doubles almost every month (Gartner). Autonomy &#8211; founded in 1996 &#8211; is the market leader in the provision of software that [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Autonomy Consolidated Archive hailed as Market Leader</strong></p>
<p style="text-align: left">Human-friendly information in the forms of documents, web pages, presentations, videos, emails, phone conversations and IMs now form around 80% of the information available and their volume doubles almost every month (Gartner). Autonomy &#8211; founded in 1996 &#8211; is the market leader in the provision of software that automates the analysis of unstructured data, whether in the form of text, audio, images or video.  Autonomy aims at capturning and processing interaction and extract pertinent information for compliance and risk management, customer service operations and customer intelligence etc.</p>
<p>Autonomy has been expanding its functionality by acquisitions including Zantaz message archiving and CA&#8217;s information governance assets. Autonomy technologies are horizontal and it offers products across a wide range of domains.<span id="more-309"></span></p>
<p>The vast growth in enterprise content leads to higher storage costs, operational issue and compliance risks. Originally the message archiving solutions came as a response to legal requirements &#8211; like SEC and NASD &#8211; and mostly used by financial services. And the benefits related to cost savings, simplified retrieval from archive, lesser operational content etc. helped in improving adoption. Today, Message archiving products that offer holistic solution handling a wide range of content and application types is fast becoming the need of the enterprises across domains.</p>
<p>The key highlights of Autonomy Consolidated Archive are:</p>
<ul>
<li>Autonomy Consolidated Archive offers the most complete product that proactively automates the full spectrum of consolidate archiving, eDiscovery, analytics and real-time policy managements.</li>
<li>The product has capabilities for message capture, retention management, eDiscovery, storage management, ability to handle a wide range of content and application types, flexible deployment models and proven security and scalability.</li>
<li>Traditionally focusing on Enterprise Search, Autonomy offers powerful and integrated eDiscovery functionality that is designed to mitigate the organization&#8217;s risk from non-compliance related to messages and other electronic content.</li>
</ul>
<p>Autonomy is hailed as the Market Leader by Forrester as well as Gartner and their product growth in the past 2 to 3 years is tremendous. In Forrester&#8217;s report (Q1, 2011), Autonomy Consolidated Archive received top marks for message capture, range of content types, message management and supervision. According to Forrester &#8221;(Autonomy&#8217;s) strong set of flexible message archiving offerings and first-class legal risk mitigation support contribute to its leadership position.&#8221; </p>
<p>According to Mike Sullivan, CEO of Autonomy Protect, &#8220;Only Autonomy can automatically apply governance policies by understanding the meaning of all types of human friendly information, including audio, video and social media content. With Autonomy Consolidated Archive as the anchoring technology, organizations can drive their critical compliance, eDiscovery, and Records Management initiatives directly from the archive, and make the data available via an on-premise, cloud-based, hybrid, or appliance-based approach.&#8221;</p>
<p>It is expected that the acquisition and the integration trend would continue making Autonomy&#8217;s product to continue to be the leader with better integrated content handling and governance capabilities as well as in providing options ranging from cloud to appliances based archiving.</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/autonomy-hailed-as-market-leader-in-message-archiving/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UIMA &#8211; Search and Analytics exploiting Unstructured Information</title>
		<link>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/mining-unstructured-information-using-uima/</link>
		<comments>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/mining-unstructured-information-using-uima/#comments</comments>
		<pubDate>Mon, 13 Dec 2010 07:19:41 +0000</pubDate>
		<dc:creator>Sasirekha R</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[RDF]]></category>
		<category><![CDATA[search features]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[UIMA]]></category>
		<category><![CDATA[unstructured data]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/?p=150</guid>
		<description><![CDATA[Mining Unstructured Information using UIMA Vast amount of knowledge is available as natural language text &#8211; web documents, reference books, encyclopedias, dictionaries, textbooks, technical reports, contracts, novels etc. Add to it the growing volumes of images, audio and video. Undisputedly unstructured information is the largest, most current and fastest growing source of knowledge. It is [...]]]></description>
				<content:encoded><![CDATA[<p><strong><span style="color: #993300">Mining Unstructured Information using UIMA</span></strong></p>
<p>Vast amount of knowledge is available as natural language text &#8211; web documents, reference books, encyclopedias, dictionaries, textbooks, technical reports, contracts, novels etc. Add to it the growing volumes of images, audio and video. Undisputedly unstructured information is the largest, most current and fastest growing source of knowledge. It is &#8220;unstructured&#8221; as it lacks explicit semantics (or structure) that is typically used by computer applications to process the same.</p>
<p>Unstructured Information Management Architecture (UIMA) is a framework for finding latent meaning, relationships and relevant facts from unstructured text. UIMA is useful for building analytic applications that analyze large volumes of unstructured information to discover relevant knowledge.<span id="more-150"></span></p>
<p>Unstructured information must become &#8220;structured&#8221; so that the applications can interpret it correctly. A typical UIM application would take plain text as input and identify entities (persons, places, organizations etc.) and relationships. UIMA standardizes semantic search and content analytics, providing a common method for meaningfully accessing data contained in text such as e-mails, blog entries, news feeds, and notes, as well as in <em>audio recordings, images, and video</em>.</p>
<p>Originally developed by IBM, now UIMA is now a top level Open Source project at Apache. In March 2009, UIMA is approved as an OASIS standard. Hopefully these trends would translate to more UIMA compliance from third party vendors.</p>
<p>UIMA&#8217;s objective to support interoperability among analytics is divided into four design goals:</p>
<p><strong>• </strong><strong>Data Representation </strong>- Support common representation of <em>artifacts</em> (the unstructured information) and <em>artifacts</em> <em>metadata</em> (results from analysis).<strong></strong></p>
<p><strong>• </strong><strong>Data Modeling and Interchange </strong>- Support the platform-independent interchange of <em>analysis</em> data in a form that facilitates a formal modeling approach and alignment with existing standards.<strong></strong></p>
<p><strong>• </strong><strong>Discovery, Reuse and Composition </strong>- Support the discovery, reuse and composition of independently-developed analytics.<strong></strong></p>
<p><strong>• </strong><strong>Service-Level Interoperability</strong> &#8211; Support concrete interoperability of independently developed <em>analytics</em> (software for analysis) based on a common service description.<strong></strong></p>
<p><strong> </strong>The seven elements of UIMA specification are:</p>
<p>1. <strong>Common Analysis Structure (CAS)</strong> &#8211; Common Data structure shared by all UIMA analytics to represent the artifcact and the artifact metadata. The CAS is an Object Graph. The CAS representation can be easily elaborated for specific domains.</p>
<p>2. <strong>Type System Model</strong> &#8211; A collection of inter-related type definitions. Every object in a CAS must be associated with a type. The UIMA Type-System is a declarative language for defining object models. Type Systems are user-defined. Each type definition declares the attributes of the type and describes valid fillers for its attributes. Types can be single-valued or multi-valued, or constrained to a legal range of values depending on the needs of the application. UIMA adopts Ecore as the type system representation, due to the alignment with standards and the availability of EMF tooling.</p>
<p>3. <strong>Base Type System</strong> &#8211; Standard definition of commonly-used, domain-independent types. This establishes a basic level of interoperability. The most significant part of the Base Type System is the <em>Annotation and Sofa (Subject of Analysis).</em></p>
<p>4. <strong>Abstract Interfaces</strong> &#8211; Defines the standard component types and operations that UIMA services implement. <em>Processing Element (PE) </em>is the supertype of all UIMA components PE interface defines getMetadata() and setConfigurationParameters(). Analyzer, CAS Mutliplier and Flow Controller are the subtypes. An Analyzer (most common) processes a CAS and possibly updates it contents. A CAS Multiplier processes a CAS and possibly creates new CASes &#8211; say for example dividing CAS into pieces or merging multiple CASes. A Flow Controller determines the route CASes take through multiple Analytics.</p>
<p>5. <strong>Behavioural Metadata</strong> &#8211; Declaratively describes what the analytic does &#8211; say what types of CASs it can process, what elements in a CAS it analyzes and what sort of effects it may have on CAS contents etc. Analytics are not required to declare behavioural metadata. But it means that an application using the analytic cannot assume anything about the operations of the analytic.</p>
<p>6. <strong>Processing Element Metadata</strong> &#8211; Defines the structure of processing element metadata and provides an XML schema in which PEs must publish this metadata. All PEs must publish metadata which describes the analytic to support discovery and composition. PE metadata has: Identification information, Configuration parameters, Behavioural Metadata, Type System and Extensions.</p>
<p>7. <strong>WSDL Service Descriptions</strong> &#8211; Specifies a WSDL description of the UIMA interfaces and a binding to a concrete SOAP interface that compliant frameworks and services MUST implement.</p>
<p>In UIMA the original content is not affected in the analysis process. Instead, an object graph that <em>stands off </em>from and annotates the content is produced. Stand-off annotations in UIMA allow for multiple content interpretations of graph complexity to be produced, co-exist, overlap and be retracted. Typically an analytic generates from the UIMA representation an in-line XML or an XMI or RDF document.</p>
<p>According to Apache, &#8220;UIMA is, by itself, an empty framework. Its purpose is to enable a world-wide, diverse community to develop inter-operable, often complex analytic components, and allow them to be combined and run together, with framework supplied scaled-out and remoting as needed&#8221;.</p>
<p>Apache site (<a href="http://uima.apache.org/">http://uima.apache.org/</a>) provides the framework, components and infrastructure. The frameworks (available in Java as well as C++) run the components. The framework provides a common platform for unstructured analytics, enabling reuse of analysis components &#8211; annotators, parsers and consumers.</p>
<p>UIMA Annotators are the ones that do the real work of extracting structured information from unstructured data. Apache site itself provides a list of annotators &#8211; like Regular Expression Annotator that detects entities like email addresses, URLs, phone numbers, zip codes or any other entity based on regular expressions and concepts, Dictionary Annotator that creates annotations based on word lists. In addition UIMA annotators including Natural Language Processors from various vendors can be downloaded from web (some of them listed in <a href="http://uima.apache.org/external-resources.html">http://uima.apache.org/external-resources.html</a>) .</p>
<p>A full analysis task for a search or intelligence application is a multi-stage process. As UIMA defines a common, standard interface, annotators from multiple vendors can be made to work together. The UIMA application can use the annotators without finding out how they work internally. The UIMA framework can take care of the integration and orchestration of the annotators.</p>
<p>Apache site also provides tools for either creating new interoperable text analytics modules or enabling existing text analytics investments to operate within the framework.</p>
<p>IBM has empowered its products and services with UIMA creating a channel for third-party vendors to deploy their text and multi-modal analytics in larger integrated solutions. IBM OmniFind <a href="http://www-306.ibm.com/software/data/enterprise-search/omnifind-enterprise/" target="_top">Enterprise Edition</a> provides UIMA for building full-text and semantic search indexes and <a href="http://www-306.ibm.com/software/data/enterprise-search/omnifind-analytics/" target="_top">Analytics Edition</a> deploys UIMA for information extraction and text analysis.</p>
<p>Semantic Search applications can benefit the most by using UIMA framework and UIMA components for:</p>
<p>· Identifying the language of the specific document</p>
<p>· Language dependent linguistic processing (tokenization, lemmatization, and even speech detection).</p>
<p>· Analyzing the text contents for entity and relation detection.</p>
<p>Business Intelligence or Government Intelligence is another major area which can use UIMA. Sample applications include:</p>
<ul>
<li>Defect Detection and Early Warning System (gain insight from service and maintenance records)</li>
<li>Customer support and self-service (analyzing the call center logs, emails etc.)</li>
<li>Public image monitoring (finding out from internet forums and discussions the image pertaining to a product or company).</li>
<li>Insurance Fraud analysis (identifying hidden relationships and patterns from claims documents).</li>
</ul>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/mining-unstructured-information-using-uima/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IBM&#8217;s Jeopardy! Challenge &#8211; Human vs. Machine Contest</title>
		<link>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/ibms-jeopardy-challenge-human-vs-machine-contest/</link>
		<comments>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/ibms-jeopardy-challenge-human-vs-machine-contest/#comments</comments>
		<pubDate>Tue, 07 Dec 2010 15:54:29 +0000</pubDate>
		<dc:creator>Sasirekha R</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[DeepQA]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Jeopardy!]]></category>
		<category><![CDATA[Natural Language]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[UIMA]]></category>
		<category><![CDATA[unstructured data]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/?p=144</guid>
		<description><![CDATA[IBM&#8217;s Jeopardy! Challenge &#8211; Human vs. Machine Contest IBM is working on a computing system, code-named &#8220;Watson&#8221;, which can understand and answer complex questions expressed in natural language. The officials from Jeopardy! and IBM have announced that they will produce a human vs. machine contest on their renowned quiz show (ref. http://www.nytimes.com/2010/06/20/magazine/20Computer-t.htm). What makes this [...]]]></description>
				<content:encoded><![CDATA[<p><strong>IBM&#8217;s Jeopardy! Challenge &#8211; Human vs. Machine Contest</strong></p>
<p>IBM is working on a computing system, code-named &#8220;Watson&#8221;, which can understand and answer complex questions expressed in natural language.</p>
<p>The officials from Jeopardy! and IBM have announced that they will produce a human vs. machine contest on their renowned quiz show (ref. <a href="http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?partner=rss&amp;emc=rss&amp;src=ig">http://www.nytimes.com/2010/06/20/magazine/20Computer-t.htm</a>).</p>
<p>What makes this interesting is that Jeopardy!<span id="more-144"></span></p>
<ul>
<li>demands knowledge of a broad range of topics including history, literature, politics, film, pop culture and science</li>
<li>clues involve irony, riddles, analyzing subtle meaning and other complexities at which humans excel</li>
<li>speed at which contestants have to answer.</li>
</ul>
<p>Watson is designed to rival the human mind&#8217;s ability to understand the actual meaning behind words, distinguish between relevant and irrelevant content, and ultimately, demonstrate confidence to deliver precise final answers. IBM says that the Star Trek Computer, a powerful and fluent conversational agent, is the driving vision for Watson.</p>
<p>Watson is an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open-domain question answering.</p>
<p>Watson is built on IBM&#8217;s <em>DeepQA</em> technology for hypothesis generation, massive evidence gathering, analysis, and scoring. Unlike document search that takes a keyword and returns multiple documents, DeepQA technology:</p>
<ul>
<li>Takes a question in natural language,</li>
<li>Understands it in a greater detail and</li>
<li>Returns a precise answer.</li>
</ul>
<p>Watson is expected to run on a massively parallel high performance computing platform like BlueGene so that the levels of accuracy, confidence and speed required by Jeopardy! Challenge is possible.</p>
<p>Similar to human contestants of Jeopardy! Watson will be self contained and have no external connections (i.e., no internet or any external source). This obviously translates to a vast amount of data stored in Watson. Most of the data are in a natural language and some structured and semi-structured data is included to help interpretation of text and refining the answers. Watson is supposed to be like any other human contestant &#8211; one who has read a lot of books and able to relate to the question and find the right answers in real time.</p>
<p>The Jeopardy! contest is only part of the big picture. The aim is to build a computer system that operates in <strong>human terms:</strong></p>
<ul>
<li>Understand complex information requirements, as people would express them &#8211; in natural language questions or interactive dialogs.</li>
<li>Retrieve information available as natural language text &#8211; web documents, reference books, encyclopedias, dictionaries, textbooks, technical reports, novels etc.</li>
<li>Synthesize, integrate and rapidly reason over the knowledge and</li>
<li>Deliver precise, meaningful response &#8211; in natural language.</li>
</ul>
<p>Using DeepQA technology, the end user should be able to enter their question in natural language form (it is not yet talk &#8211; which would involve voice recognition) and the system can sift through vast amount of information (in various formats and sources) and give a ranked list of the most compelling, precise answers. In addition to these answer(s), the list of supporting evidences based on which the answer(s) were arrived at would be given so that the user can verify the correctness of the answer and the select the most suitable one</p>
<p>Right now what is being considered is a hybrid approach:</p>
<p>1. Build effective and adaptable open-domain QA systems using advanced NLP, Information Retrieval and Machine Learning to interpret and reason over huge volumes of widely accessible naturally encoded knowledge (unstructured).  The difficulty in this is the inability to prove the answer is correct.</p>
<p>2. The confidence level (on the correctness of the answer) can be built based on a combination of reasoning methods that operate on automatically extracted entities, relations, available structured data (say in traditional databases) and semi-structured knowledge (say from Semantic Web).</p>
<p>Customer Relationship Management, Regulatory Compliance, Contact Centers, Help Desks, Web Self-Service, Business Intelligence, etc. are some of the applications which can benefit with DeepQA technology.</p>
<p>DeepQA uses UIMA (Unstructured Information Management Architecture), the framework for building applications that perform deep analysis on unstructured content, including natural language text, speech, images and video. Watson uses UIMA-AS (UIMA on asynchronous messaging) as its principal infrastructure for assembling, scaling-out and deploying all its analytic components.</p>
<p>Originally developed by IBM, now UIMA - is now open-source (Apache) and is also an OASIS standard. I plan to elaborate on UIMA in a later blog.</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/ibms-jeopardy-challenge-human-vs-machine-contest/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
