Sentiment is a judgment or thought passed based on feeling. Sentiment plays a major role when products from different brands are developed with same quality and how sentiment helps one brand’s product to get a better market then the other. Sentiment analysis is opinion mining that deals with sentiment polarity categorization. The various process involved in sentiment analysis explained below:
Online portals like Twitter provides API to extract data but most of other portals won’t provide such mechanism. Scripting languages are used to extract the content from online portal. In general the data in these cases constitutes rating numbers 1 to 5 or 1 to 10 and rating description. Extracting, analyzing and charting rating numbers are relatively easier than analyzing rating description.
Bag of words consist of some standard words and those words are compared to the data from review to derive binary feature vector. However this method is not effective on phrases so collocations is done with bigram functions. Bigram help in identifying negation words as they occur as pair or group of words. During feature extraction spell check need to be done to clean up the data. Parts of speech tag identification is key part of feature extraction.
On classification of data there are various methods like Naive Bayes method. It uses Bayes’ Rules to calculate the probability of feature of a vector in a class. This method is little complicated and hard to trace back which probabilities are causing certain classifications. Decision list one other method which operates on rule based tagger and that has advantage of human readability.
Based on the Sentiment analysis, the results are being charted or represented in tabular form. In simple rating numbers analysis extracts the results are charted in graph with products on x axis and review rating number on the y axis. In case of Navie Bayes and Decision list method the results are formatted in tabular column with Features as one of the column and scores on other columns.
OTHER CLASSIFICATION METHODS
Lexicon based approach, in this method for each review the matched negative and positive words from predefined set of sentiment lexicon are being counted. The polarity of the review is calculated based on the counting polar words and assigning polarity values such as positive, negative and neutral to them.
CHALLENGES IN SENTIMENT ANALYSIS
The various challenges in sentiment analysis starts right from data collection. Most of the data are free text and available on HTML pages. The rating numbers and rating description on many of the cases won’t match so simple analysis done using rating numbers are not accurate and this leads to analysis of the rating description using various machine language learning tools. Analysis using these tools are complex in nature. Since ratings are open to all customers there are good possibilities of junk reviews and spelling mistakes are common on rating content.
NATURAL LANGUAGE PROCESSING TOOLS
Natural language Toolkit (NLTK) – It runs on Python language platform. It provides features like tokenizing, identifying named entities and parsing.
Stanford core NLP Suite – It provides tools for parts of speech tagging, grammar praising and name entity recognition.
GATE and Apache UIMA – It is help in building complex NLP workflows which integrates with different processing steps.
SENTIMENT ANALYTICS TOOLS
SAS Text Analytics – It provides Text analytics software to extract information from text content. It discover patterns and trends from text using natural language processing, advanced linguistic technologies and advanced statistical modeling.
IBM Text Analytics – It converts unstructured survey text into quantitative data. It automates the categorization process to eliminate the manual processing. To reduce ambiguities in human language it uses linguistic based technologies
Lexalytics Text Analytics – Salience is text analytics engine build by Lexalytics. It is helpful for social media monitoring, sentiment analysis, survey of customer voice.
Smart logic – It provides rule base classification modelling and information visualization. It applies metadata and classification to deliver navigation and search experience.