Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into useful business intelligence. Text analytics processes can be performed manually, but the amount of text-based data available to companies today makes it increasingly important to use intelligent, automated solutions.
Why is text analytics important?
Emails, online reviews, tweets, call center agent notes, and the vast array of other written feedback, all hold insight into customer wants and needs. But only if you can unlock it. Text analytics is the way to extract meaning from this unstructured text, and to uncover patterns and themes.
Several text analytics use cases exist:
- Case management—for example, insurance claims assessment, healthcare patient records and crime-related interviews and reports
- Competitor analysis
- Fault management and field-service optimization
- Legal ediscovery in litigation cases
- Media coverage analysis
- Pharmaceutical drug trial improvement
- Sentiment analytics
- Voice of the customer
A well-understood process for text analytics includes the following steps:
- Extracting raw text
- Tokenizing the text—that is, breaking it down into words and phrases
- Detecting term boundaries
- Detecting sentence boundaries
- Tagging parts of speech—words such as nouns and verbs
- Tagging named entities so that they are identified—for example, a person, a company, a place, a gene, a disease, a product and so on
- Parsing—for example, extracting facts and entities from the tagged text
- Extracting knowledge to understand concepts such as a personal injury within an accident claim
An array of techniques may be employed to derive meaning from text. The most accurate method is an intelligent, trained human being reading the text and interpreting its meaning. This is the slowest method and the most costly, but the most accurate and powerful. Ideally, the reader is trained in qualitative research techniques and understands the industry and contextual framework of the text. A well-trained qualitative researcher can extract extraordinary understanding and insight from text. In a typical project, the qualitative researcher might read hundreds of paragraphs to analyze the text, develop hypotheses, draw conclusions, and write a report. This type of analysis is subject to the risks of bias and misinterpretation on the part of the qualitative researcher, but these limitations are with us always—regardless of method. The power of the human mind cannot be equaled by any software or any computer system. Decision Analyst’s team of highly trained qualitative researchers are experts at understanding text.
Content Analysis or Open-End Coding
The history of text analytics traces back to World War II and the development of “content analysis” by governmental intelligence services. That is, intelligence analysts would read documents, magazines, records, dispatches, etc., and assign numeric codes to different topics, concepts, or ideas. By summing up these numeric codes, the analyst could quantify the different concepts or ideas, and track them over time. This approach was further developed by the survey research industry after the war. Today as then, open-end questions in surveys are analyzed by someone reading the textual answers and assigning numeric codes. These codes are then summarized in tables, so that the analyst has a quantitative sense of what people are saying. This remains a powerful method of text mining or text analytics. It leverages the power of the human mind to discern subtleties and context.
The first step is careful selection of a representative sample of respondents or responses. In surveys the sample is usually representative and comparatively small (less than 2,000), so all open-ended questions are coded. However, in the case of social media text, CRM system, or customer complaint system, the text might be made up of millions of customer comments. So the first step is the random selection of a few thousand records, and these records are checked for duplicates, geographic distribution, etc. Then, a human being reads each and every paragraph of text and assigns numeric codes to different meanings and ideas. These codes are tabulated and statistical summaries are prepared for the analyst. This is text mining or text analytics at its apogee. Open-end coding offers the strength of numbers (statistical significance) and the intelligence of the human mind. Decision Analyst operates a large multilanguage coding facility with highly trained staff specifically for content analysis and text analytics.
With the explosion of keyboard-generated text related to the spread of PCs and the Internet over the past two decades, many companies are searching for automated ways to analyze large volumes of textual data. Decision Analyst offers several text-analytic services, based on different software systems, to analyze and report on textual data. These software systems are very powerful, but they cannot take the place of the thinking human brain. The results from these software systems should be thought of as approximations, as crude indicators of truth and trends, but the results must always be verified by other methods and other data.