sentimental analysis for product rating


Sentiment is a judgment or thought passed based on feeling. Sentiment plays a major role when products from different brands are developed with same quality and how sentiment helps one brand’s product to get a better market then the other. Sentiment analysis is opinion mining that deals with sentiment polarity categorization. The various process involved in sentiment analysis explained below:


Data collection

Online portals like Twitter provides API to extract data but most of other portals won’t provide such mechanism. Scripting languages are used to extract the content from online portal.  In general the data in these cases constitutes rating numbers 1 to 5 or 1 to 10 and rating description. Extracting, analyzing and charting rating numbers are relatively easier than analyzing rating description.


Feature extraction

Bag of words consist of some standard words and those words are compared to the data from review to derive binary feature vector. However this method is not effective on phrases so collocations is done with bigram functions. Bigram help in identifying negation words as they occur as pair or group of words. During feature extraction spell check need to be done to clean up the data. Parts of speech tag identification is key part of feature extraction.



On classification of data there are various methods like Naive Bayes method. It uses Bayes’ Rules to calculate the probability of feature of a vector in a class. This method is little complicated and hard to trace back which probabilities are causing certain classifications. Decision list one other method which operates on rule based tagger and that has advantage of human readability.



Based on the Sentiment analysis, the results are being charted or represented in tabular form. In simple rating numbers analysis extracts the results are charted in graph with products on x axis and review rating number on the y axis. In case of Navie Bayes and Decision list method the results are formatted in tabular column with Features as one of the column and scores on other columns.



Lexicon based approach, in this method for each review the matched negative and positive words from predefined set of sentiment lexicon are being counted. The polarity of the review is calculated based on the counting polar words and assigning polarity values such as positive, negative and neutral to them.



The various challenges in sentiment analysis starts right from data collection. Most of the data are free text and available on HTML pages. The rating numbers and rating description on many of the cases won’t match so simple analysis done using rating numbers are not accurate and this leads to analysis of the rating description using various machine language learning tools. Analysis using these tools are complex in nature. Since ratings are open to all customers there are good possibilities of junk reviews and spelling mistakes are common on rating content.



Natural language Toolkit (NLTK) – It runs on Python language platform. It provides features like tokenizing, identifying named entities and parsing.

Stanford core NLP Suite – It provides tools for parts of speech tagging, grammar praising and name entity recognition.

GATE and Apache UIMA – It is help in building complex NLP workflows which integrates with different processing steps.




SAS Text Analytics – It provides Text analytics software to extract information from text content. It discover patterns and trends from text using natural language processing, advanced linguistic technologies and advanced statistical modeling.

IBM Text Analytics – It converts unstructured survey text into quantitative data. It automates the categorization process to eliminate the manual processing. To reduce ambiguities in human language it uses linguistic based technologies

Lexalytics Text Analytics – Salience is text analytics engine build by Lexalytics.  It is helpful for social media monitoring, sentiment analysis, survey of customer voice.

Smart logic – It provides rule base classification modelling and information visualization. It applies metadata and classification to deliver navigation and search experience.




Customers today leave pieces of information and data over the Internet – bits of knowledge into who they are, what they like, and what they are going to buy. Furthermore, as most businesses today, the automotive business is assembling and utilizing as quite a bit of this data as they can collect.Of course, not all data is made equivalent. Data might be deficient, unstructured, or out and out off-base. Also, to exacerbate the issue, this data isn’t as a matter of course simple to gather and change into a significant data.


On the off chance that you don’t know who your clients and best prospects are, by what means would you be able to send then messages to get them into your dealership or repair focus? Without a doubt, you may have data on when they last went to your place of business, including a couple contact points of interest here and there. What’s more, maybe you have their charging address on the off chance that they have worked with you some time recently. Be that as it may, as is frequently the case, you are most likely missing various points of interest to help you comprehend your clients on a significantly more customized level.

You might need to know which families have kids might be a great opportunity to move up to a bigger vehicle, who has a teenager driver in the house  or who is occupied with the outside. Points of interest, for example, wage,  status, occupation, distractions, way of life, and age are a few case of demographics that can be utilized to make focused on advertising messages to which your shoppers are most able to relate.

Specialized Auto DATA

A few specific data arrangement suppliers can give  gritty data on vehicles and their proprietors. Search for a data arrangement that incorporates:

  • 100% populated with Make, Model and Year as got specifically from VINs.
  • Completely populated database in which each lead record incorporates data, for example, name, address, make, model and year.
  • Premium chooses, for example, in-business sector for another vehicle, purchaser demographics, fragmented riches demonstrating, email addresses, and full VIN.
  • Choices accessible, for example, motor size, fuel sort, drive train, motor piece, and motor barrels.
  • Approved mails and index help accepted telephone numbers. Automotive showcasing data.


The Modi government in power, there are expectations of increased focus on reforms and ramp up in infrastructure. Thus, government spending on infrastructure in roads and airports and higher GDP growth in the future will benefit the auto sector in general. We expect a slew of launches both in passenger cars and utility vehicles (UVs) given that the competition has intensified.

Our prospect focusing on expands showcasing results by focusing on your business messages and financing offers to prepared, willing and ready to purchase customers . Driving the up and coming era of automotive promoting by consolidating further bits of knowledge into buyer states of mind and practices with significant heading in the utilization of immediate, customary and advanced media, constant lead scoring and publicizing focusing on.

Key Features and Benefits

  • Cross media – direct mail, email, online advertising and more
  • Innovative – the next generation of patented methodologies
  • Affordable – value driven
  • Effective – validation results available.

Individualized,  administration showcasing effort are vital to holding clients and utilizing the most  client esteem for dealerships. This administration showcasing program build deals and benefits by focusing on clients in value, or those toward the end of term, lease or guarantee. Flawless Prospect uses DMS, OEM motivating forces, book qualities and outsider data to offer vehicle merchants an aggressive edge by distinguishing current and triumph open doors that have the most noteworthy likelihood of acquiring or overhauling with a dealership. Tweaked cautions convey prepared to-purchase open doors at the perfect time, to the right partner, in a RO dashboard that is straightforward and straightforward. Transform current administration clients into faithful, rehash deals clients when you get them into another vehicle with a like installment for practically no cash down


“PROSPECTS  DATA  and PLANNING the strategy  makes the clear path to be a successful Automobile Company”




Applications of Data Mining

Service providers

The first example of Data Mining and Business Intelligence comes from service providers in the mobile phone and utilities industries. Mobile phone and utilities companies use Data Mining and Business Intelligence to predict ‘churn’, the terms they use for when a customer leaves their company to get their phone/gas/broadband from another provider. They collate billing information, customer services interactions, website visits and other metrics to give each customer a probability score, then target offers and incentives to customers whom they perceive to be at a higher risk of churning.


Another example of Data Mining and Business Intelligence comes from the retail sector. Retailers segment customers into ‘Recency, Frequency, Monetary’ (RFM) groups and target marketing and promotions to those different groups. A customer who spends little but often and last did so recently will be handled differently to a customer who spent big but only once, and also some time ago. The former may receive a loyalty, upsell and cross-sell offers, whereas the latter may be offered a win-back deal, for instance.


Perhaps some of the most well -known examples of Data Mining and Analytics come from E-commerce sites. Many E-commerce companies use Data Mining and Business Intelligence to offer cross-sells and up-sells through their websites. One of the most famous of these is, of course, Amazon, who use sophisticated mining techniques to drive there, ‘People who viewed that product, also liked this’ functionality.


Supermarkets provide another good example of Data Mining and Business Intelligence in action. Famously, supermarket loyalty card programmes are usually driven mostly, if not solely, by the desire to gather comprehensive data about customers for use in data mining. One notable recent example of this was with the US retailer Target. As part of its Data Mining programme, the company developed rules to predict if their shoppers were likely to be pregnant. By looking at the contents of their customers’ shopping baskets, they could spot customers who they thought were likely to be expecting and begin targeting promotions for nappies (diapers), cotton wool and so on. The prediction was so accurate that Target made the news by sending promotional coupons to families who did not yet realise they were pregnant.

Crime agencies

The use of Data Mining and Business Intelligence is not solely reserved for corporate applications and this is shown in our final example. Beyond corporate applications, crime prevention agencies use analytics and Data Mining to spot trends across myriads of data – helping with everything from where to deploy police manpower (where is crime most likely to happen and when?), who to search at a border crossing (based on age/type of vehicle, number/age of occupants, border crossing history) and even which intelligence to take seriously in counter-terrorism activities.

neural network -DATA MINING

A neural network is a powerful computational data model that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform “intelligent” tasks similar to those performed by the human brain.


A neural network acquires knowledge through learning.

A neural network’s knowledge is stored within inter-neuron connection strengths known as synaptic weights.

The true power and advantage of neural networks lies in their ability to represent both linear and non-linear relationships and in their ability to learn these relationships directly from the data being modeled. Traditional linear models are simply inadequate when it comes to modeling data that contains non-linear characteristics.

The most common neural network model is the Multilayer Perceptron (MLP). This type of neural network is known as a supervised network because it requires a desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using historical data so that the model can then be used to produce the output when the desired output is unknown.

The demonstration of a neural network learning to model using the exclusive-or (Xor) data. The Xor data is repeatedly presented to the neural network. With each presentation, the error between the network output and the desired output is computed and fed back to the neural network. The neural network uses this error to adjust its weights such that the error will be decreased. This sequence of events is usually repeated until an acceptable error has been reached or until the network no longer appears to be learning.

A good way to introduce the topic is to take a look at a typical application of neural networks. Many of today’s document scanners for the PC come with software that performs a task known as optical character recognition (OCR). OCR software allows you to scan in a printed document and then convert the scanned image into to an electronic text format such as a Word document, enabling you to manipulate the text. In order to perform this conversion the software must analyze each group of pixels (0’s and 1’s) that form a letter and produce a value that corresponds to that letter. Some of the OCR software on the market use a neural network as the classification engine.


The demonstration of a neural network used within an optical character recognition (OCR) application. The original document is scanned into the computer and saved as an image. The OCR software breaks the image into sub-images, each containing a single character. The sub-images are then translated from an image format into a binary format, where each 0 and 1 represents an individual pixel of the sub-image. The binary data is then fed into a neural network that has been trained to make the association between the character image data and a numeric value that corresponds to the character.


Neural networks have been successfully applied to broad spectrum of data-intensive applications, such as:

Process Modeling and Control – Creating a neural network model for a physical plant then using that model to determine the best control settings for the plant.

Machine Diagnostics – Detect when a machine has failed so that the system can automatically shut down the machine when this occurs.

Portfolio Management – Allocate the assets in a portfolio in a way that maximizes return and minimizes risk.

Target Recognition – Military application which uses video and/or infrared image data to determine if an enemy target is present.

Medical Diagnosis – Assisting doctors with their diagnosis by analyzing the reported symptoms and/or image data such as MRIs or X-rays.

Credit Rating – Automatically assigning a company’s or individuals credit rating based on their financial condition.

Targeted Marketing – Finding the set of demographics which have the highest response rate for a particular marketing campaign.

Voice Recognition – Transcribing spoken words into ASCII text.

Financial Forecasting – Using the historical data of a security to predict the future movement of that security.

Quality Control – Attaching a camera or sensor to the end of a production process to automatically inspect for defects.

Intelligent Searching – An internet search engine that provides the most relevant content and banner ads based on the users’ past behavior.

Fraud Detection – Detect fraudulent credit card transactions and automatically decline the charge.







Clustering is the process of breaking down a large population that has a high degree of variation and noise into smaller groups with lower variation. It is a popular data mining activity. In a poll conducted by Kdnuggets, clustering was voted as the 3rd most frequently used data mining technique in 2011. Only decision trees and regression got more votes.

Cluster analysis is an important part of an analyst’s arsenal. One needs to master this technique as it is going to be used often in business situations. Some common applications of clustering are –

  1. Clustering customer behavior data for segmentation
  2. Clustering transaction data for fraud analysis in financial services
  3. Clustering call data to identify unusual patterns
  4. Clustering call-centre data to identify outlier performers (high and low)

Screenshot (6).png

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.

Connectivity models: for example, hierarchical clustering builds models based on distance connectivity.

Centroid models: for example, the k-means algorithm represents each cluster by a single mean vector.

Distribution models: clusters are modeled using statistical distributions, such as multivariate normal distributions used by the Expectation-maximization algorithm.

Density models: for example, DBSCAN and OPTICS defines clusters as connected dense regions in the data space.

Subspace models: in Biclustering (also known as Co-clustering or two-mode-clustering), clusters are modeled with both cluster members and relevant attributes.

Group models: some algorithms do not provide a refined model for their results and just provide the grouping information.

Graph-based models: a clique, that is, a subset of nodes in a graph such that every two nodes in the subset are connected by an edge can be considered as a prototypical form of cluster. Relaxations of the complete connectivity requirement (a fraction of the edges can be missing) are known as quasi-cliques, as in the HCS clustering algorithm

There are also finer distinctions possible, for example:

strict partitioning clustering: here each object belongs to exactly one cluster

strict partitioning clustering with outliers: objects can also belong to no cluster, and are considered outliers.

overlapping clustering (also: alternative clustering, multi-view clustering): while usually a hard clustering, objects may belong to more than one cluster.

hierarchical clustering: objects that belong to a child cluster also belong to the parent cluster

subspace clustering: while an overlapping clustering, within a uniquely defined subspace, clusters are not expected to overlap.

There are also finer distinctions possible, for example:

strict partitioning clustering: here each object belongs to exactly one cluster

strict partitioning clustering with outliers: objects can also belong to no cluster, and are considered outliers.

overlapping clustering (also: alternative clustering, multi-view clustering): while usually a hard clustering, objects may belong to more than one cluster.

hierarchical clustering: objects that belong to a child cluster also belong to the parent cluster

subspace clustering: while an overlapping clustering, within a uniquely defined subspace, clusters are not expected to overlap.

k-means clustering

In centroid-based clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set. When the number of clusters is fixed to k, k-means clustering gives a formal definition as an optimization problem: find the k cluster centers and assign the objects to the nearest cluster center, such that the squared distances from the cluster are minimized.

Most k-means-type algorithms require the number of clusters – k – to be specified in advance, which is considered to be one of the biggest drawbacks of these algorithms. Furthermore, the algorithms prefer clusters of approximately similar size, as they will always assign an object to the nearest centroid. This often leads to incorrectly cut borders in between of clusters (which is not surprising, as the algorithm optimized cluster centers, not cluster borders).

K-means has a number of interesting theoretical properties. First, it partitions the data space into a structure known as a Voronoi diagram. Second, it is conceptually close to nearest neighbor classification, and as such is popular in machine learning. Third, it can be seen as a variation of model based classification, and Lloyd’s algorithm as a variation of the Expectation-maximization algorithm for this model discussed below.


Data Mining and the IoT

A standout amongst the most vital elements for the achievement of the IoT is the capacity of frameworks, administrations and applications to perform information mining. Why would that be? All things considered, I feel that one of the key parts of IoT is to drive savvy connections with clients (like robotization and choice backing). To do as such, frameworks need to gather data about clients and their connection (utilizing sensors and web assets), make proper information investigation, channel information and present clients the result or settle on savvy choices.

Before examining about Data Mining and the IoT, we should make initial a short presentation on Data Mining. Information Mining is the way toward distinguishing designs in (typically) substantial information sets. To give you an illustration, think about having as an action tracker that you bear on throughout the day. Taking a gander at the information the tracker gathers, you see more action amid a few nighttimes and on weekend mornings (since you go running amid that time). You recognize this example by connecting the action esteem score with time, contrasting every quality on the information and others. So really you bunch action qualities to various levels (medium, high, and so forth.) and afterward you enlist the time the gathered action values occur. This procedure is called information bunching. While you can undoubtedly make sense of such examples yourself, imaging having hundreds or a great many information sections and movement values and timestamps, as well as span, climate conditions, calories devoured, and so forth. To manage such issues, software engineering has connected measurable techniques and assembled devices that permit you to perform Data Mining and concentrate helpful data out of the information sets. Probably the most vital uses of Data Mining are information irregularity location, information bunching, information characterization, highlight determination, and time arrangement expectation.


IoT and data anomaly detection

Irregularity recognition can be an incredible element for IoT applications. We should take again the movement tracker illustration. This time expect that you have set a month to month or week after week objective like loosing some weight or achieving an action or calorie blazing level. Notwithstanding checking your movement, the framework is additionally ready to decide your day by day calorie utilization. We should expect again that you go running each Tuesday and Thursday evening and also weekend mornings. One Tuesday you disregard to go running and your day by day action falls low, while your calorie utilization continues as before. This is a peculiarity for the framework. On the off chance that you’re following application was highlighting information mining procedures, it is ready to remind you the next day to end up more dynamic and not to disregard your running on Thursday (or more awful, it could tell your companions that you get to be languid: fascinating application for interpersonal organizations + movement following administrations!).

Once more, this is a straightforward case. In any case, consider that you track your folks exercises (like how regularly they go out, when they enter, the amount of time they spent in a room, and so forth.) through movement sensors, and their home surroundings conditions. In the event that one day they go out for shopping and for reasons unknown they are late, or on the off chance that somebody invests a lot of energy in the washroom, an inconsistency location framework could caution you consequently.

IoT and information bunching

As said in the given case with the action tracker, information bunching alludes to gathering of information taking into account particular elements and their qualities. It is the most widely recognized procedure of unsupervised machine learning. It is called so in light of the fact that in different procedures like information grouping, you have to “prepare” the framework first with information (think about the underlying voice acknowledgment frameworks where you needed to prepare the framework getting out particular words). Information grouping can be connected however on another information set without truly knowing much about it ahead of time (e.g., what sort of information, and so on.). The quantity of groups is normally given as an info (e.g., the quantity of movement levels the movement information ought to be separated into), yet there are likewise calculations that can consequently sort information in the most ideal way. Information grouping possibly not be utilized straightforwardly as a part of IoT applications, but rather by and large it can be a middle of the road venture for distinguishing designs from the gathered information.