Text analytics

Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into useful business intelligence. Text analytics processes can be performed manually, but the amount of text-based data available to companies today makes it increasingly important to use intelligent, automated solutions.

textanalysticswall1.jpg

Why is text analytics important?

Emails, online reviews, tweets, call center agent notes, and the vast array of other written feedback, all hold insight into customer wants and needs. But only if you can unlock it. Text analytics is the way to extract meaning from this unstructured text, and to uncover patterns and themes.

tutorial-text-analytics-for-security-9-638

Several text analytics use cases exist:

  • Case management—for example, insurance claims assessment, healthcare patient records and crime-related interviews and reports
  • Competitor analysis
  • Fault management and field-service optimization
  • Legal ediscovery in litigation cases
  • Media coverage analysis
  • Pharmaceutical drug trial improvement
  • Sentiment analytics
  • Voice of the customer

A well-understood process for text analytics includes the following steps:

  1. Extracting raw text
  2. Tokenizing the text—that is, breaking it down into words and phrases
  3. Detecting term boundaries
  4. Detecting sentence boundaries
  5. Tagging parts of speech—words such as nouns and verbs
  6. Tagging named entities so that they are identified—for example, a person, a company, a place, a gene, a disease, a product and so on
  7. Parsing—for example, extracting facts and entities from the tagged text
  8. Extracting knowledge to understand concepts such as a personal injury within an accident claim

text-analytics1

Qualitative Technique

An array of techniques may be employed to derive meaning from text. The most accurate method is an intelligent, trained human being reading the text and interpreting its meaning. This is the slowest method and the most costly, but the most accurate and powerful. Ideally, the reader is trained in qualitative research techniques and understands the industry and contextual framework of the text. A well-trained qualitative researcher can extract extraordinary understanding and insight from text. In a typical project, the qualitative researcher might read hundreds of paragraphs to analyze the text, develop hypotheses, draw conclusions, and write a report. This type of analysis is subject to the risks of bias and misinterpretation on the part of the qualitative researcher, but these limitations are with us always—regardless of method. The power of the human mind cannot be equaled by any software or any computer system. Decision Analyst’s team of highly trained qualitative researchers are experts at understanding text.

Content Analysis or Open-End Coding

The history of text analytics traces back to World War II and the development of “content analysis” by governmental intelligence services. That is, intelligence analysts would read documents, magazines, records, dispatches, etc., and assign numeric codes to different topics, concepts, or ideas. By summing up these numeric codes, the analyst could quantify the different concepts or ideas, and track them over time. This approach was further developed by the survey research industry after the war. Today as then, open-end questions in surveys are analyzed by someone reading the textual answers and assigning numeric codes. These codes are then summarized in tables, so that the analyst has a quantitative sense of what people are saying. This remains a powerful method of text mining or text analytics. It leverages the power of the human mind to discern subtleties and context.

093310_343455

The first step is careful selection of a representative sample of respondents or responses. In surveys the sample is usually representative and comparatively small (less than 2,000), so all open-ended questions are coded. However, in the case of social media text, CRM system, or customer complaint system, the text might be made up of millions of customer comments. So the first step is the random selection of a few thousand records, and these records are checked for duplicates, geographic distribution, etc. Then, a human being reads each and every paragraph of text and assigns numeric codes to different meanings and ideas. These codes are tabulated and statistical summaries are prepared for the analyst. This is text mining or text analytics at its apogee. Open-end coding offers the strength of numbers (statistical significance) and the intelligence of the human mind. Decision Analyst operates a large multilanguage coding facility with highly trained staff specifically for content analysis and text analytics.

big-data-text-analytics-predictive-analytics-processing

Machine Text Mining or Text Analytics

With the explosion of keyboard-generated text related to the spread of PCs and the Internet over the past two decades, many companies are searching for automated ways to analyze large volumes of textual data. Decision Analyst offers several text-analytic services, based on different software systems, to analyze and report on textual data. These software systems are very powerful, but they cannot take the place of the thinking human brain. The results from these software systems should be thought of as approximations, as crude indicators of truth and trends, but the results must always be verified by other methods and other data.

 

Advertisements

BIG DATA IN ORGANIZATION

 

Big data analytics is a trending practice that many companies are adopting. Before jumping in and buying big data tools, though, organizations should first get to know the landscape.

In essence, big data analytics tools are software products that support predictive and prescriptive analytics applications running on big data computing platforms typically, parallel processing systems based on clusters of commodity servers, scalable distributed storage and technologies such as Hadoop and NoSQL databases..

In addition, big data analytics tools provide the framework for using data mining techniques to analyze data, discover patterns, propose analytical models to recognize and react to identified patterns, and then enhance the performance of business processes by embedding the analytical models within the corresponding operational applications.

Screen-shot-2013-07-10-at-16.47.22-PM.png

Powering analytics: Inside big data and advanced analytics tools

Big data analytics yields a long list of vendors. However, many of these vendors provide big data platforms and tools that support the analytics process for example, data integration, data preparation and other types of data management software. We focus on tools that meet the following criteria:

  • They provide the analyst with advanced analytics algorithms and models.
  • They’re engineered to run on big data platforms such as Hadoop or specialty high-performance analytics systems.
  • They’re easily adaptable to use structured and unstructured data from multiple sources.
  • Their performance is capable of scaling as more data is incorporated into analytical models.
  • Their analytical models can be or already are integrated with data visualization and presentation tools.
  • They can easily be integrated with other technologies.

big-data-investments-by-industry

Big data and advanced analytics tools:

While some individuals in the organization are looking to explore and devise new predictive models, others look to embed these models within their business processes, and still others will want to understand the overall impact that these tools will have on the business. In other words, organizations that are adopting big data analytics need to accommodate a variety of user types, such as:

The data scientist, who likely performs more complex analyses involving more complex data types and is familiar with how underlying models are designed and implemented to assess inherent dependencies or biases.

The business analyst, who is likely a more casual user looking to use the tools for proactive data discovery or visualization of existing information, as well as some predictive analytics.

The business manager, who is looking to understand the models and conclusions.

IT developers, who support all the prior categories of users.

everis-big-datawilsonv14-23-638

How Applications of Big Data Drive Industries:

Generally, most organizations have several goals for adopting big data projects. While the primary goal for most organizations is to enhance customer experience, other goals include cost reduction, better targeted marketing and making existing processes more efficient. In recent times, data breaches have also made enhanced security an important goal that big data projects seek to incorporate. More importantly however, where do you stand when it comes to big data? You will very likely find that you are either:

  • Trying to decide whether there is true value in big data or not
  • Evaluating the size of the market opportunity
  • Developing new services and products that will utilize big data
  • Already utilizing big data solutions Repositioning existing services and products to utilize big data, or
  • Already utilizing big data solutions.

With this in mind, having a bird’s eye view of big data and its application in different industries will help you better appreciate what your role is or what it is likely to be in the future, in your industry or across different industries. With this in mind, having a bird’s eye view of big data and its application in different industries will help you better appreciate what your role is or what it is likely to be in the future, in your industry or across different industries.

In this article, I shall examine 10 industry verticals that are using big data, industry-specific challenges that these industries face, and how big data solves these challenges.

applications_of_big_data_infographic

Conclusion:

Having gone through 10 industry verticals including how big data plays a role in these industries, here are a few key takeaways:

  • There is substantial real spending around big data
  • To capitalize on big data opportunities, you need to Familiarize yourself with and Understand where spending is occurring
  • Match market needs with your own capabilities and solutions
  • Vertical industry expertise is key to utilizing big data effectively and efficiently
  • If there’s anything you’d like to add, explore, or know, do feel free to comment below.

INTEGRATED BIG DATA APPLICATIONS

bigdataapps2Integrated application systems are infrastructure systems pre-integrated with databases, applications software or both, providing appliance-like functionality for Big Data solutions, analytic platforms or similar demands. For ISVs, there are five key reasons to consider delivering your Big Data/analytics solutions in the form of integrated applications systems benefits that can make the difference between market-moving success and tepid sales and profits.

Operational Environment:

The typical IT enterprise evolved to its current state by utilizing standards and best practices. These include simple things like data naming conventions to more complex ones such as a well-maintained enterprise data model. New data-based implementations require best practices in organization, documentation and governance. With new data and processes in the works you must update documentation, standards and best practices and continue to improve quality.

Costs and benefits of new mainframe components typically involve software license charges. The IT organization will need to re-budget and perhaps even re-negotiate current licenses and lease agreements. As always, new hardware comes with its own requirements of power, footprint, and maintenance needs.

A Big Data implementation brings additional staff into the mix: experts on new analytics software, experts on special-purpose hardware, and others. Such experts are rare, so your organization must hire, rent, or outsource this work. How will they fit into your current organization?  How will you train current staff to grow into these positions?

image009

Start with the Source System:

This is your core data from operational systems. Interestingly, many beginning Big Data implementations will attempt to access this data directly (or at least to store it for analysis), thereby bypassing succeeding steps. This happens because Big Data  sources have not yet been integrated into your IT architecture. Indeed, these data sources may be brand new or never accessed.

Those who support the source data systems may not have the expertise to assist in analytics, while analytics experts may not understand the source data. Analytics accesses production data directly, so any testing or experimenting is done in a production environment.

Analyze Data Movement:

These data warehouse subsystems and processes first access data from the source systems. Some data may require transformations or ‘cleaning’. Examples include missing data or invalid data such as all zeroes for a field defined as a date. Some data must be gathered from multiple systems and merged, such as accounting data. Other data requires validation against other systems.

Data from external sources can be extremely problematic.  Consider data from an external vendor that was gathered using web pages where numbers and dates were entered in free-form text fields. This opens the possibility of non-numeric characters in numeric data fields. How can you maximize the amount of data you process, while minimizing the issues with invalid fields?  The usual answer is ‘cleansing’ logic that handles the majority of invalid fields using either calculation logic or assignment of default values.

big-data-bim

Review Data Storage for Analytics:

This is the final point, the destination where all data is delivered. From here, we get direct access to data for analysis, perhaps by approved query tools. Some subsets of data may be loaded into data marts, while others may be extracted and sent to internal users for local analysis. Some implementations include publish-and-subscribe features or even replication of data to external sources.

Coordination between current processes and the big data process is required. IT support staff will have to investigate whether options to get early use of the data are available. It may also be possible to load current data and the corresponding big data tables in parallel. Delays in loading data will impact the accuracy and availability of analytics; this is a business decision that must be made, and will differ from implementation to implementation.

Greater solution consistency:

In an integrated application system, you integrate the hardware and software for your product, so you control the environment that supports your product. That ensures your Big Data and analytics applications have all the processing, storage, and memory resources they need to deliver optimal performance on compute-intensive jobs. In short, your application runs the way it was designed, so customer satisfaction is optimized.

Better system security:

Analytics and other Big Data systems frequently deal with financial, medical or other proprietary information. By delivering your product as an integrated application system, you can build in the security tools necessary to prevent access by unauthorized users, hackers or other intruders. Your application is safer, so your customers gain confidence in your products.

data-integration-ecosystem-for-big-data-and-analytics

The conclusion:

Big data today has scale-up and scale-out issues. Further, it often involves integration of dissimilar architectures. When we insist that we can deal with big data by simply scaling up to faster, special-purpose hardware, we are neglecting more fundamental issues.

BIG DATA IN CLOUD

The rise of cloud computing and cloud data stores have been a precursor and facilitator to the emergence of big data. Cloud computing is the commodification of computing time and data storage by means of standardized technologies.

It has significant advantages over traditional physical deployments. However, cloud platforms come in several forms and sometimes have to be integrated with traditional architectures.

This leads to a dilemma for decision makers in charge of big data projects. These projects regularly exhibit unpredictable, bursting, or immense computing power and storage needs. At the same time business stakeholders expect swift, inexpensive, and dependable products and project outcomes. This article introduces cloud computing and cloud storage, the core cloud architectures, and discusses what to look for and how to get started with cloud computing.

Vertical scaling achieves elasticity by adding additional instances with each of them serving a part of the demand. Software like Hadoop are specifically designed as distributed systems to take advantage of vertical scaling. They process small independent tasks in massive parallel scale. Distributed systems can also serve as data stores like NoSQL databases, e.g. Cassandra or HBase, or filesystems like Hadoop’s HDFS. Alternatives like Storm provide coordinated stream data processes in near real-time through a cluster of machines with complex workflows.

The inter changeability of the resources together with distributed software design absorbs failure and equivalently scaling of virtual computing instances unperturbed. Spiking or bursting demands can be accommodated just as well as personalities or continued growth.
Renting practically unlimited resources for short periods allows one-off or periodical projects at a modest expense. Data mining and web crawling are great examples. It is conceivable to crawl huge web sites with millions of pages in days or hours for a few hundred dollars or less. Inexpensive tiny virtual instances with minimal CPU resources are ideal for this purpose since the majority of crawling the web is spent waiting for IO resources. Instantiating thousands of these machines to achieve millions of requests per day is easy and often costs less than a fraction of a cent per instance hour.

Of course, such mining operations should be mindful of the resources of the web sites or application interfaces they mine, respect their terms, and not impede their service. A poorly planned data mining operation is equivalent to a denial of service attack. Lastly, cloud computing is naturally a good fit for storing and processing the big data accumulated form such operations.

 

BIG DATA IN ROBOTICS

3d-robots-by-franz-steiner-2-previewRobots with big data analytics and you have an intense blend that could move us for the vast majority of our occupations. big data analytics permits us to influence a lot of organized and unstructured and also quick moving data, for example, ongoing discussions on content, email and online networking, video pictures, photographs, data from location sensors in our telephones, and so on. Put this capacity into a robot and they won’t simply undermine to supplant lower-talented occupations, for example, sequential construction system laborers or general store work force however now they have their eyes on specialists, pilots and writers as well.

            What we are currently seeing is an unrest that will change our lives for eternity. The last time we saw something comparable was amid the mechanical transformation when machines conveyed enormous efficiency picks up. In any case, that time it gave individuals who essentially chipped away at homesteads new openings for work in industrial facilities. What is by all accounts distinctive now is that machines will take our occupations without giving us the same level of new open doors and employments .

Google’s ‘Self-driving car’ on the freeway.We left to see this ‘big-data enabled machine’ out and about and requested that the driver moderate down so we could take a photograph on our mobile. When we then drove on I talked to the driver about the auto and his reaction was ‘Looks like Google will accept my position soon!’ This discussion really provoked us to keep in touch with this piece.

We now have ‘intelligent’ robots and machines that leverage our ever-increasing ability to analyze enormous and unstructured datasets (what we call big data analytics) to perform human jobs. Here are just a few very real examples (and there are endless others):

We now have ‘intelligent’ robots and machines that leverage our ever-increasing ability to analyze enormous and unstructured datasets (what we call big data analytics) to perform human jobs. Here are just a few very real examples (and there are endless others):

Pilots: We know that autopilots have been assisting pilots to fly planes for many years. However, the latest commercial airlines are now able to fly the plane unaided. They can take off and land you safely (and arguably more safely than humans as most air disasters are down to human error). We just have to look at the military where now unmanned aircrafts (so called drone) are taking over. Fighter jet pilots will be Air Force history soon. Drones are armed with high resolution cameras that generate images which can be analyzed on board or transmitted via satellite to a powerful big-date engine that also monitors call logs of potential targets, movements using sensors, social media activity, etc. The big-data enabled war is on!

Doctors: Robots are already assisting surgeons to perform operations and doctors use large-scale databases of medical information to inform their decisions. However, soon robots will be able to make a diagnosis and perform operations without human input. Robots could scan your body and then based on the entire medical knowledge library (as well as data on your own medical history, DNA code, etc.) make a solid diagnosis and even remove a brain tumor with better results than even the best brain surgeon could.

Call center worker: We all know about the irritating automated answering systems in call centers that give you options and then route your call to the supposedly ‘right person’ that has the skills and knowledge to help us with our query. What we are now seeing is the rise of natural language systems that are able to have a conversation with humans. IBM has developed Watson – a computer that recently challenged two of the all-time best Jeopardy! players. Without access to the Internet, Watson won the game by interpreting natural language questions and answering back after analyzing its massive data memory (that included a copy of the entire Wikipedia database). This means that when you ring any call center you will always speak to the ‘right person’ – only that the person is a robot instead!

Journalist: A company called Narrative Science recently launched a software product that can write newspaper stories about sports games directly from the games’ statistics. The same software can now be used to automatically write an overview of a company’s business performance using information available on the web. It uses algorithms to turn the information into attractive articles. You can see how newspapers of the future will use these tools to generate stories and deliver them to you with customized content and in a bespoke format based on your preferences it gets from the browser logs of what other content you are reading and what social media posts you are sharing.

This development is somewhat scary as well as tremendously exciting. What is scary is the thought that a big-data enabled robot could take my job in the not-too-distant future. I am interested to hear your thoughts on this – please share this post in your network and leave a comment to generate a discussion on this important topic.

BIG DATA IN AN ORGANISATION

Big data helps the organisation to treat customers more like individuals and build better long-term relationships with customers.

1.Predict exactly what customers want

When the businessperson had your chunk of bread all wrapped up and prepared to go before you even advised her that is the thing that you needed? Giving that same support of online customers taking into account their past conduct is precisely how organizations are utilizing enormous information to expand consumer loyalty and increment buys.

Organizations assemble a huge amount of information on clients, what they’ve obtained as well as what sites they visit, where they live, when they’ve reached client administration, and on the off chance that they communicate with their image on online networking. It’s a staggering measure of apparently inconsequential information (that is the reason it’s called enormous information), yet organizations that can appropriately mine this to offer a more customized touch. To appropriately anticipate the future, organizations must elevate the right items to the right clients on the right channel. Different organizations have gone with the same pattern, for example, prescribing music on Spotify, motion pictures on Netflix, or Pins on Pinterest.

2. Get customers excited about their own data

Simply giving clients huge amounts of information about themselves is insufficient. Organizations need to filter through the majority of the information and concentrate the most applicable data as an effectively edible or deal for clients. In any case, if done right, information that has any kind of effect foe client’s everyday lives, whether it relates to their wellbeing and wellness or to their cash, can have kind of effect on an organization’s arrival on venture.

By giving clients information that is important to the, these organizations are utilizing huge information to create super fans. When they get snared all alone individual information, will probably keep signing in or utilizing the item. What is more, on the off chance that everything goes as per the ground breaking strategy, they will have gotten to be brand loyalists.

3. Improve customer service interactions

Organizations who are utilizing information to enhance their client administration are making it one stride further. At the point when a client connects, the agent can all the more rapidly and productively take care of the issue on the off chance that they have the right information before them. They wont have to solicit the same number of inquiries from the client since they definitely know the answers.

Utilizing information to enhance client connections is particularly imperative when clients have more channels than any other time in recent memory to associate with brands. Whether it’s an online networking administrator on the flip side of a furious tweet or an agent noting a telephone call, those organization who outfit their workers with instruments that give top to bottom client information stand separated in light of the fact that they give incredible administration, which just enhances with every communication. Southwest Airlines, for example, is utilizing discourse investigation to concentrate information rich data from live-recorded collaborations among clients and work force to improve comprehension of their clients. Organizations must guarantee their client administration has the right information as well as knows how to convey the learning of that information to clients.

4. Identify customer pain points and solve them

Most organizations realize what some of their clients agony focuses are. Those who are diving profound into the information to comprehend those troubles are enhancing their clients experience.

Take Delta. All carriers know a top sympathy toward travelers is lost stuff, especially when they are on a flight that is postponed and missed associations included. Delta looked further into their information and made an answer that would evacuate the vulnerability of where a traveler’s sack may be. Clients can now snap a photograph of their stuff label utilizing the “Track My Bag” highlight on the Delta application and after that watch their gear as it advances towards the last goal. Regardless of the possibility that a sack doesn’t make it on the proposed flight, travelers spare time following it down. Finding another approach to put enormous information to use for the advantage of their travelers put Delta out front in an aggressive business sector.

 

Importance of Big Data for Business

Big data is a term that refers to data sets or combinations of data sets whose size (volume), complexity (variability), and rate of growth (velocity) make them difficult to be captured, managed, processed or analyzed by conventional technologies and tools, such as relational
databases and desktop statistics or visualization packages, within the time necessary to make them useful. While the size used to determine whether a particular data set is considered big data is not firmly defined and continues to change over time, most analysts and practitioners currently refer to data sets from 30-50terabytes(10 12 or 1000 gigabytes per terabyte) to multiple petabytes (1015 or 1000 terabytes per petabyte) as big data.
The complex nature of big data is primarily driven by the unstructured nature of much of the data that is generated by modern technologies, such as that from web logs, radio frequency Id (RFID), sensors embedded in devices, machinery, vehicles, Internet searches, social networks such as Facebook, portable computers, smart phones and other cell phones, GPS devices, and call center records. In most cases, in order to effectively utilize big data, it must be combined with structured data (typically from a relational database) from a more conventional business application, such as Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).
Similar to the complexity, or variability, aspect of big data, its rate of growth, or velocity aspect, is largely due to the ubiquitous nature of modern online, real time data capture devices, systems, and networks. It is expected that the rate of growth of big data will continue to increase for the foreseeable future.
Specific new big data technologies and tools have been and continue to be developed. Much of the new big data technology relies heavily on massively parallel processing (MPP) databases, which can concurrently distribute the processing of very large sets of data across many servers.
When big data is effectively and efficiently captured, processed, and analyzed, companies are able to gain a more complete understanding of their business, customers, products, competitors, etc. which can lead to efficiency improvements, increased sales, lower costs, better customer service, and/or improved products and services.

Challenges

Understanding and Utilizing Big Data

It is a daunting task in most industries and companies that deal with big data just to understand the data that is available to be used, determining the best use of that data
based on the companies’ industry, strategy, and tactics. Also, these types of analyses need to be performed on an ongoing basis as the data landscape changes at an ever increasing rate, and as executives develop more and more of an appetite for analytics based on all available information.

New, Complex, and Continuously Emerging Technologies

Since much of the technology that is required in order to utilize big data is new to most organizations, it will be necessary for these organizations to learn about these new technologies at an ever accelerating pace, and potentially engage with different technology providers and partners than they have used in the past. Like with all technology, firms entering into the world of big data will need to balance the business needs associated with big data with the associated costs of entering into and remaining engaged in big data capture, storage, processing, and analysis.

Cloud Based Solutions

A new class of business software applications has emerged whereby company data is managed and stored in data centers around the globe. While these solutions range from ERP, CRM, Document Management, Data Warehouses and Business Intelligence to many others, the common issue remains the safe keeping and management of confidential company data. These solutions often offer companies tremendous flexibility and cost savings opportunities compared to more traditional on premise solutions but it raises a new dimension related to data security and the overall management of an enterprise’s Big
Data paradigm.

Privacy, Security, and Regulatory Considerations

Given the volume and complexity of big data, it is challenging for most firms to obtain a reliable grasp on the content of all of their data and to capture and secure it adequately, so that confidential and/or private business and customer data are not accessed by and/or disclosed to unauthorized parties. The costs of a data privacy breach can be enormous. For instance, in the health care field, class action lawsuits have been filed, where the plaintiff
has sought $1000 per patient re cord that has been inappropriately accessed or lost. In the
regulatory area, for instance, the proper storage and transmission of personally identifiable information (PII), including that contained in unstructured data such as emails can be problematic and necessitate new and improved security measures and
technologies. For companies doing business globally there are significant differences in
privacy laws between the U.S. and other countries. Lastly, it will be very important for
most forms to tightly integrate their big data, data security/privacy, and regulatory functions.

Archiving and Disposal of Big Data

Since big data will lose its value to current decision making over time, and since it is voluminous and varied in content and structure, it is necessary to utilize new tools, technologies, and methods to archive and delete big data, without sacrificing the effectiveness of using your big data for current business needs.

The Need for IT, Data Analyst, and Management Resources

It is estimated that there is a need for approximately 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data literate managers, either retrained or hired. Therefore, it is likely that any firm that undertakes a big data initiative will need to either retrain existing people, or engage new people in order for their initiative to be successful.