“What is Content Analytics?, Alex”
“The technology behind Watson represents the future of data management and analytics. In the real world, this technology will help us uncover insights in everything from traffic to healthcare.”
- John Cohn, IBM Fellow, IBM Systems and Technology Group
How can the same technology used to play Jeopardy! give you better business insight?
Why Watson matters
You have to start by understanding that IBM Watson DeepQA is the world’s most advanced question answering machine. It uncovers answers by understanding the meaning buried in the context of a natural language question. By combining advanced Natural Language Processing (NLP) and DeepQA automatic question answering technology, Watson represents the future of content and data management, analytics, and systems design. IBM Watson leverages core content analysis, along with a number of other advanced technologies, to arrive at a single, precise answer within a very short period of time. The business applications for this technology is limitless starting with clinical healthcare, customer care, government intelligence and beyond. I covered the technology side of Watson in my previous posting 10 Things You Need to Know About the Technology Behind Watson.
Amazingly, Watson works like the human brain to analyze the content of a Jeopardy! question. First, it tries to understand the question to determine what is being asked. In doing so, it first needs to analyze the natural language text. Next, it tries to find reasoned answers, by analyzing a wide variety of disparate content mostly in the form of natural language documents. Finally, Watson assesses and determines the relative likelihood that the answers found, are correct based on a confidence rating.
A great example of the challenge is described by Stephen Baker in his book Final Jeopardy: Man vs. Machine and the Quest to Know Everything: ‘When 60 Minutes premiered, this man was U.S. President. ‘ Traditionally it’s been difficult for a computer to understand what ‘premiered’ means and that it’s associated with a date. To a computer, ‘premiere’ could also mean ‘premier’. Is the question about a person’s title or a production opening? Then it has to figure out the date when an entity called ’60 Minutes’ premiered, and then find out who was the ‘U.S. President’ at that time. In short, it requires a ton of contextual understanding.
I am not talking about search here. This is far beyond what search tools can do. A recent Forrester report, Take Control Of Your Content, states that 45% of the US workforce spends three or more hours a week just searching for information. This is completely inefficient. See my previous posting Goodbye Search … It’s About Finding Answers … Enter Watson vs. Jeopardy! for more on this topic.
Natural Language Processing (NLP) can be leveraged in any situation where text is involved. Besides answering questions, it can help improve enterprise search results or even develop an understanding of the insight hidden in the content itself. Watson leverages the power of NLP as the cornerstone to translate interactions between computers and human (natural) languages.
NLP involves a series of steps that make text understandable (or computable). A critical step, lexical analysis is the process of converting a sequence of characters into a set of tokens. Subsequent steps leverage these tokens to perform entity extraction (people, places, things), concept identification (person A belongs to organization B) and the annotation of documents with this and other information. A feature of IBM Content Analytics (known as LanguageWare) is performing the lexical analysis function in Watson as part of natural language processing.
Why this matters to your business
Jeopardy! poses a similar set of contextual information challenges as those found in the business world today:
- Over 80 percent of information being stored is unstructured (is text based).
- Understanding that 80 plus percent isn’t simple. Like Jeopardy! … subtle meaning, irony, riddles, acronyms, abbreviations and other complexities all present unique computing challenges not found with structured data in order to derive meaning and insight. This is where natural language processing (NLP) comes in.
The same core NLP technology used in Watson is available now to deliver business value today by unlocking the insights trapped in the massive amounts of unstructured information in the many systems and formats you have today. Understanding the content, context and value of this unstructured information presents an enormous opportunity for your business. This is already being done today in a number of industries by leveraging IBM Content Analytics.
IBM Content Analytics (ICA) itself is a platform to derive rapid insight. It can transform raw information into business insight quickly without building models or deploying complex systems. Enabling all knowledge workers to derive insight in hours or days … not weeks or months. It helps address industry specific problems such as healthcare treatment effectiveness, fraud detection, product defect detection, public safety concerns, customer satisfaction and churn, crime and terrorism prevention and more. Here are some actual customer examples:
Healthcare Research – Like most healthcare providers, BJC Healthcare, had a treasure trove of historical information trapped in unstructured clinical notes, diagnostic reports containing essential information for the study of disease progression, treatment effectiveness and long-term outcomes. Their existing Biomedical Informatics (BMI) resources were disjointed and non-interoperable, available only to a small fraction of researchers, and frequently redundant, with no capability to tap into the wealth of research information trapped in unstructured clinical notes, diagnostic report and the like.
With IBM Content Analytics, BJC and university researchers are now able to analyze unstructured information to answer key questions that were previously unavailable. Questions like: Does the patient smoke?, How often and for how long?, If smoke free, how long? What home medications is the patient taking? What is the patient sent home with? What was the diagnosis and what procedures performed on patient? BJC now has deeper insight into medical information and can uncover trends and patterns within their content, to provide better healthcare to their patients.
Customer Satisfaction – Identifying customer satisfaction trends about products, services and personnel is critical to most businesses. The Hertz Corporation and Mindshare Technologies, a leading provider of enterprise feedback solutions, are using IBM Content Analytics software to examine customer survey data, including text messages, to better identify car and equipment rental performance levels for pinpointing and making the necessary adjustments to improve customer satisfaction levels.
By using IBM Content Analytics, companies like Hertz can drive new marketing campaigns or modify their products and services to meet the demands of their customers. “Hertz gathers an amazing amount of customer insight daily, including thousands of comments from web surveys, emails and text messages. We wanted to leverage this insight at both the strategic level and the local level to drive operational improvements,” said Joe Eckroth, Chief Information Officer, the Hertz Corporation.
For more information about ICA at Hertz: http://www-03.ibm.com/press/us/en/pressrelease/32859.wss
Research Analytics – To North Carolina State University, the essence of a university is more than education – it is the advancement and dissemination of knowledge in all its forms. One of the main issues faced by NC State was dealing with the vast number of data sources available to them. The university sought a solution to efficiently mine and analyze vast quantities of data to better identify companies that could bring NC State’s research to the public. The objective was a solution designed to parse the content of thousands of unstructured information sources, perform data and text analytics and produce a focused set of useful results.
Using IBM Content Analytics, NC State was able to reduce the time needed to find target companies from months to days. The result is the identification of new commercialization opportunities, with tests yielding a 300 percent increase in the number of candidates. By obtaining insight into their extensive content sources, NC State’s Office of Technology Transfer was able to find more effective ways to license technologies created through research conducted at the university. “What makes the solution so powerful is its ability to go beyond conventional online search methods by factoring context into its results.” – Billy Houghteling, executive director, NC State Office of Technology Transfer.
For more information about ICA at NC State: http://www-01.ibm.com/software/success/cssdb.nsf/CS/SSAO-8DFLBX?OpenDocument&Site=software&cty=en_us
You can put the technology of tomorrow to work for you today, by leveraging the same IBM Content Analytics capability helping to power Watson. To learn more about all the IBM ECM products utilizing Watson technology, please visit these sites:
IBM Content Analytics: http://www-01.ibm.com/software/data/content-management/analytics/
IBM Classification Module: http://www-01.ibm.com/software/data/content-management/classification/
IBM eDiscovery Analyzer: http://www-01.ibm.com/software/data/content-management/products/ediscovery-analyzer/
IBM OmniFind Enterprise Edition: http://www-01.ibm.com/software/data/enterprise-search/omnifind-enterprise/
I’ll be at the Jeopardy! viewing party in Washington, DC on February 15th and 16th … hope to see you there. In the mean time, leave me your thoughts and questions below.