CLUSTERING- DATA MINING

Clustering is the process of breaking down a large population that has a high degree of variation and noise into smaller groups with lower variation. It is a popular data mining activity. In a poll conducted by Kdnuggets, clustering was voted as the 3rd most frequently used data mining technique in 2011. Only decision trees and regression got more votes.

Cluster analysis is an important part of an analyst’s arsenal. One needs to master this technique as it is going to be used often in business situations. Some common applications of clustering are –

  1. Clustering customer behavior data for segmentation
  2. Clustering transaction data for fraud analysis in financial services
  3. Clustering call data to identify unusual patterns
  4. Clustering call-centre data to identify outlier performers (high and low)

Screenshot (6).png

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.

Connectivity models: for example, hierarchical clustering builds models based on distance connectivity.

Centroid models: for example, the k-means algorithm represents each cluster by a single mean vector.

Distribution models: clusters are modeled using statistical distributions, such as multivariate normal distributions used by the Expectation-maximization algorithm.

Density models: for example, DBSCAN and OPTICS defines clusters as connected dense regions in the data space.

Subspace models: in Biclustering (also known as Co-clustering or two-mode-clustering), clusters are modeled with both cluster members and relevant attributes.

Group models: some algorithms do not provide a refined model for their results and just provide the grouping information.

Graph-based models: a clique, that is, a subset of nodes in a graph such that every two nodes in the subset are connected by an edge can be considered as a prototypical form of cluster. Relaxations of the complete connectivity requirement (a fraction of the edges can be missing) are known as quasi-cliques, as in the HCS clustering algorithm

There are also finer distinctions possible, for example:

strict partitioning clustering: here each object belongs to exactly one cluster

strict partitioning clustering with outliers: objects can also belong to no cluster, and are considered outliers.

overlapping clustering (also: alternative clustering, multi-view clustering): while usually a hard clustering, objects may belong to more than one cluster.

hierarchical clustering: objects that belong to a child cluster also belong to the parent cluster

subspace clustering: while an overlapping clustering, within a uniquely defined subspace, clusters are not expected to overlap.

There are also finer distinctions possible, for example:

strict partitioning clustering: here each object belongs to exactly one cluster

strict partitioning clustering with outliers: objects can also belong to no cluster, and are considered outliers.

overlapping clustering (also: alternative clustering, multi-view clustering): while usually a hard clustering, objects may belong to more than one cluster.

hierarchical clustering: objects that belong to a child cluster also belong to the parent cluster

subspace clustering: while an overlapping clustering, within a uniquely defined subspace, clusters are not expected to overlap.

k-means clustering

In centroid-based clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set. When the number of clusters is fixed to k, k-means clustering gives a formal definition as an optimization problem: find the k cluster centers and assign the objects to the nearest cluster center, such that the squared distances from the cluster are minimized.

Most k-means-type algorithms require the number of clusters – k – to be specified in advance, which is considered to be one of the biggest drawbacks of these algorithms. Furthermore, the algorithms prefer clusters of approximately similar size, as they will always assign an object to the nearest centroid. This often leads to incorrectly cut borders in between of clusters (which is not surprising, as the algorithm optimized cluster centers, not cluster borders).

K-means has a number of interesting theoretical properties. First, it partitions the data space into a structure known as a Voronoi diagram. Second, it is conceptually close to nearest neighbor classification, and as such is popular in machine learning. Third, it can be seen as a variation of model based classification, and Lloyd’s algorithm as a variation of the Expectation-maximization algorithm for this model discussed below.

.

Advertisements

Business Analytics ,An important tool today

images

Business Analytics ,An important tool today

 

In the world  today, the worldwide political and physical hindrances are caving in opening a worldwide business sector for all business., making an intense rivalry to offer their items, snatch more benefit and draw in more clients.Business Intelligence is the most utilized weapon by all associations for social event data and basic leadership and the regular workers is compelled to conceptualize on their next stride. This is the circumstance where Business Analytics assume a critical part .

Business Analytics is an issue solver as well as, the association can recover the past examination and the day information base , which offers the association to think a stage ahead in drawing nearer business.Business Analytics in the present situation can be considered as a response to a few inquiries including the knowledge of the client, effect of item in the client, fakeness, cost and administration of the item that makes the clients and purchase the item once more.It additionally helps the association to settle on better choices by anticipating great benefits .

A Business Analyst goes about as a scaffold between business thoughts and business capacities; making and perusing important changes and advancements to business forms. Commonly determined by leading ‘execution ability evaluations’, or ‘possibility concentrates on’, the Business Analyst frequently assesses business execution. Such surveys assess abilities running from  those noticeable to the client through to those inserted somewhere down in the assembling procedure.

Generally, in our innovation driven business world, an expansive extent of the progressions and enhancements identify with programming frameworks – thus groups in the association in charge of making, keeping up and conveying IT frameworks, are an essential core interest. Expectedly, this has ended up being a troublesome relationship, with testing correspondence issues or mis-understandings that frequently prompt squandered exertion or scrapped ventures.

 

business examination lessens the general expenses for the undertaking. This idea is regularly nonsensical for administrators new to business investigation. At first become flushed, including a business examiner and creating extra venture documentation has all the earmarks of being an extra cost. In the event that you are overseeing without a business investigator today and you present one, the expense may seem to increment,in the event that you center the group on the right prerequisites, then there ought to be lessened measure of unnecessarychange. There will dependably be some change, as usage energizes learning. However, numerous ventures are tormented by change since prerequisites are not surely knew. Furthermore, this sort of progress is waste.Stakeholder time is important, yet without somebody in the business investigator part, partners may invest overabundance energy in ineffective talks. An expert can drive a legitimate and proficient basic leadership forms, track open issues, and record discourses, diminishing the measure of time spent reiterating past examinations and going down .When the business investigator is authorized to locate any number of answers for an issue, particularly arrangements that may not include data innovation, the business examiner really may lessen costs by discovering more savvy arrangements.

Business examination is a critical part of any business and organization. This is on account of progress is the main steady thing that should be always managed. Change happens in both your objective business sector and in the business you have a place with and for your business to survive and succeed regardless of the progressions, appropriate business investigation must be led at the ideal time. In such an extremely merciless business environment, business investigation is critical keeping in mind the end goal to look after intensity. This includes taking data accumulated from various sources and investigating the data so that an estimate without bounds patterns can be made. This will help in detailing approaches to enhance business procedures, business operations and settling on brilliant business choices to advance the organization’s primary concern. It is essential to comprehend your key promoting territories to help the business increment income and cut overabundance waste.

Good day! Take care….

Text Mining

The text mining handles unstructured data, remove important numeric lists from the content, and, hence, make the data contained in the content open to the different data mining algorithms. Data can be extricated to determine summaries for the words contained in the archives or to figure outlines for the reports in view of the words contained in them. Subsequently, you can examine words, clusters of words utilized as a part of reports, and so forth., or you could break down archives and decide similarities between them or how they are identified with different variables of enthusiasm for the data mining project. In the most broad terms, content mining will “transform content into numbers”, which can then be joined in different examinations, for example, predictive data mining projects, the use of unsupervised learning strategies and so forth.

Information retrieval

Information retrieval manages the recovery of data from a substantial number of content based records. A portion of the database frameworks are not generally exhibit in data recovery frameworks on the grounds that both handle various types of information. The primary issue in a information retrieval system is to find significant reports in an archive gathering in light of a user’s question. This sort of user’s question comprises of some keywords depicting a data need.

In such search issues, the user takes an activity to haul significant data out from an accumulation. This is suitable when the user has ad-hoc data need, i.e., a short term need. Be that as it may, if the user has a long haul data need, then the information retrieval system can likewise take an activity to push any recently arrived data thing to the user. This sort of access to data is called Information Filtering. What’s more, the relating frameworks are known as Filtering Systems.

Measure of text retrieval

The check for the accuracy of a system is needed, when it retrieves a number of documents on the basis of user’s input.

Relevant – set of documents relevant to a query

Retrieved – set of retrieved document.

{Relevant} ∩ {Retrieved} – set of documents that are relevant and retrieved

The quality of text retrieval can be measured by

  • Precision- {Relevant} ∩ {Retrieved}/{Retrieved}
  • Recall-  {Relevant} ∩ {Retrieved}/ {Relevant}
  • F-score- Commonly used trade-off is F- score. The information retrieval system often needs to trade-off for precision or vice versa

F- score-  Recall x precision / (recall + precision) / 2

Application:

A topic tracking works by keeping users profiles and, taking into account the archives the users views, predicts different reports important to the users. Yahoo offers a free topic tracking instrument (www.alerts.yahoo.com) that permits users to pick keywords and advises them at the point when news identifying with those subjects gets to be accessible. Topic tracking technology has restrictions, be that as it may. For instance, if a users sets up an alarm for “text mining”, they will get a few news stories on mining for minerals, and not very many that are really on text mining. A portion of the better text mining tools let users select specific classifications of interest or the product consequently can even deduce the user’s advantages taking into account his/her perusing history and navigate data. There are numerous territories where topic mining can be connected in industry. It can be utilized to ready organizations whenever the competitor is in the news. This permits them to stay aware of focused items or changes in the market. Also, organizations might need to track news all alone organization and items. It could likewise be utilized in the restorative business by specialists and other individuals searching for new medicines for sicknesses and who wish to keep up on the most recent progressions. People in the field of instruction could likewise utilize subject following no doubt they have the most recent references for exploration in their general vicinity of interest.

Text summarization is tremendously useful for attempting to make sense of regardless of whether a long report meets the user’s needs and merits perusing for additional data. With substantial writings, content rundown programming forms also, compresses the record in the time it would take the users to peruse the primary paragraph. The way to outline is to diminish the length and point of interest of a record while holding its principle focuses and generally speaking meaning.

OLAP

OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company’s beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate “dimension.” OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into sub attributes.

OLAP can be used for data mining or the discovery of previously undiscerned relationships between data items. An OLAP database does not need to be as large as a data warehouse, since not all transactional data is needed for trend analysis. Using Open Database Connectivity (ODBC), data can be imported from existing relational databases to create a multidimensional database for OLAP.

Two leading OLAP products are Hyperion Solution’s Essbase and Oracle’s Express Server. OLAP products are typically designed for multiple-user environments, with the cost of the software based on the number of users.

OLAP OPERATIONS

OLAP provides a user-friendly environment for interactive data analysis. A number of OLAP data cube operations exist to materialize different views of data, allowing interactive querying and analysis of the data.

The most popular end user operations on dimensional data are:

 

 

Roll up

 

The roll-up operation (also called drill-up or aggregation operation) performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by climbing down a concept hierarchy, i.e. dimension reduction.

Roll Down

 

The roll down operation (also called drill down) is the reverse of roll up. It navigates from less detailed data to more detailed data. It can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions.

Slicing

 

Slice performs a selection on one dimension of the given cube, thus resulting in a subcube.

Dicing

 

The dice operation defines a subcube by performing a selection on two or more dimensions.

Pivot

 

Pivot otheriwise known as Rotate changes the dimensional orientation of the cube, i.e. rotates the data axes to view the data from different perspectives. Pivot groups data with different dimensions.

Other OLAP operations

 

Some more OLAP operations include:

SCOPING: Restricting the view of database objects to a specified subset is called scoping. Scoping will allow users to recieve and update some data values they wish to recieve and update.

SCREENING: Screening is performed against the data or members of a dimension in order to restrict the set of data retrieved.

DRILL ACROSS: Accesses more than one fact table that is linked by common dimensions. COmbiens cubes that share one or more dimensions.

DRILL THROUGH: Drill down to the bottom level of a data cube down to its back end relational tables.

 

Download this free guide

 

Recent trends of Data mining in Banking

Banking is the life blood of finance in a country. A country cannot do any transactions without banks. Therefore this sector offers a large database for analysis.  The most recent technology adopted – core banking has created even more avenues.

Banks generate a large amount of data during its everyday transactions like customer information, transaction details, risk profiles, credit card details, limit and collateral details, compliance and Anti Money Laundering (AML) related information, trade finance data, SWIFT and telex messages. This data is used by the banks to arrive at conclusions and make different decisions regarding customers are optimizing their processes.

Data mining and banks

bankalytics

Data mining is the process of deriving knowledge hidden from large volumes of raw data. The knowledge must be new, not obvious, must be relevant and can be applied in the domain where this knowledge has been obtained.

Banking has become very competitive in today’s world. The bankers have to make smart decisions to stay in competition.

  • In today’s highly competitive market environment customers are spoilt for choices. Banks need to be proactive in analyzing customer preferences and profiles and tune their products and services accordingly in order to retain their customer base. It would appear to be the case to keep running to stay in the same place. (Bhambri, 2011). By segmenting customers into good customers and bad customers, banks can avoid losses before it is too late (Kazi and Ahmed, 2012). By analyzing patterns of transactions, bank can track fraud transactions before it affects its profitability (Ogwueleka, 2011). These are highly desirable areas where data mining could help
  • The MIS of banks contain huge volume of data both operational and historical. Data mining can help in using this huge volume of data to make critical decisions.  Banks which adopt methods to wisely use such available data by applying data mining techniques will be hugely benefited and have an advantage over others who don’t. The areas which are benefited using data mining techniques to make decisions include marketing, risk management, fraud prevention, customer satisfaction,  money laundering etc.  credit appraisal systems are also benefited through this
  • All lending activities of a bank involve a certain amount of risk; Proper assessment of this risk will make the risk management process easier and will also enable to limit the risk of financial loss to the bank. Most important is assessing correctly the capacity of the customer to repay the loan.  Use of data mining techniques will make the task of the credit manager easier.  It will help the credit manager to know which customer will delay or default in repayment of the loan.  This advanced knowledge will help the bank to adopt preventive measures to avoid losses.

big-data-analytics-in-banking

  • To forecast such situation parameters such as turnover trends, balance sheet figures, utilization of credit limits, cheque return patterns are analysed. Historical default patterns will help in predicting future defaults when some patterns are discovered. Usage of data mining techniques will then be helpful to enhance the accuracy of credit scores and predict defaults in advance. The credit score represents the borrowers creditworthiness. Behavioral scores are obtained from probability models of customer behavior to forecast their future behavior in various situations. Data mining can derive this score using the past behavior of the borrower related to debt repayments by analyzing available credit history.
  • Banks are using the data which is available to enable personal loans for the customers through their ATM’s. This automated loan facility will help banks revive retail credit. The banks are using big data analytics to analyse facts such as the customer’s personal details, work profile, income, and payment capacities to decide on the customer’s credit worthiness. After analyzing all this data, the bank decides the amount of loan that the customer is eligible to get. The customer gets an offer for the loan, the next time he uses the ATM. If the customer is interested, he can just agree to the terms and conditions and type in his registered mobile number to get the loan amount will be credited to the customer’s bank account.

Hence usage of data mining techniques will greatly help banks in improving their business.

Consumer Analytics

TIBCO-Spotfire-Beyond-the-Friend-Zone-Enhancing-Social-Media-Analytics2.jpg

Consumer analytics involves the techniques through which customer’s behavior is recorded and business decisions are taken with the help of analytical tools and techniques. One of the important objectives of consumer analytics is to cope up with the ever changing consumer behavior. This is the digital age where consumers look out for more and more choices, meeting those becomes extremely difficult with the help of conventional marketing tools and techniques. This is where big data analytics helps businesses track the consumer behavior, with the help of various data mining and predictive modeling tools.

One of the applications of consumer analytics is the in-store analytics, which involves mapping the consumer purchase pattern, consumer demographics, what the customer bought, why did the customer buy, why did not the customer buy a product, etc. In-store analytics is applied to the retail business, where technologies like smart carts, RFID are used tags to map the location of products to determine the movement of those. This technology helps in finding what products to consumers prefer the most, which product doesn’t sell anymore, etc. The fashion retail industry is one of the industries where consumer analytics is used intensively, since the trends changes in fashion industry even within a span of 3 months. So it is important to track the consumer behavior closely in this industry.

Another important application of consumer analytics in retail space is in hypermarkets, like Walmart, Tesco, etc. Here buying behavior of each and every customer is captured and promotions, offers are given to the customer based on their purchase pattern. Hypermarkets also use various predictive techniques to estimate demand for a particular product in future. A typical example is where pregnant ladies are identified through their purchase pattern and with the help of this data the demand for baby products like diapers, baby oil, baby clothes, etc are predicted.

Sentimental analysis is another powerful tool where the feelings of the customers are studied for a certain product or a service. One instance where a mobile phone manufacturer wanted to know what people felt about 4G technology, the result of the analysis was that people viewed 4G as a feature of a smartphone. Thus the company positioned its mobile phone as “The best 4G phone”. Consumer analytics not only serves as a tool to predict the consumer buying behavior but also enables companies to design ads that cater to the customer’s needs.

With the help of web analytics it is possible to track how much time that the customer spends on a company’s website, what product does he/she buy often, what category is he/she interested in, etc. These data put together helps the company to provide ads and offers to that particular product category, which is more relevant and cost effective.

Today the trend is shifting from segment-based marketing to individual based marketing. Mass marketing is no longer valid in today’s competitive environment and ever-changing consumer expectations, so individual based marketing is what will make a brand sustain in the market. A typical example of individual marketing is in the fashion industry where the customer can choose their own outfit through a mobile application; customize the outfit according to the need. Once the customization is done, the customer can then visit the store at the desired time to try on the outfit. This customized offering of products helps consumers to enjoy products tailored to meet their own need, rather than choosing from a set of available products. Individual customization would be impossible without the use of big data and predictive modeling, where millions of data from millions of customers are tracked to predict what the customer will buy.

Consumer analytics starts right from customer profiling to what the customer is likely to buy. Consumer analytics has a wide range of applications from retails stores, hypermarkets, e-tailing and much more, and its role is becoming larger and larger very day. Thus consumer analytics plays an important role both in major and minor decisions taken by a business.

DATA MINING IN PROJECT MANAGEMENT

The majority of the professional activities are created as projects. Project Management is an intricate procedure including numerous components interrelated and it is dependent on external agents that may confuse its control. Projects, as characterized by PMBOK of Project Management Institute (PMI, 2009), are designed to take care of a particular issue or need, have a short term impact and is extraordinary in time and not repeatable in the same conditions. Instability is a key variable connected with project management. This variable influences the utilization of resources, the estimation of time and cash and the effect of risk and quality.

Indeed, risks, uncertainty or estimation are the watchwords to seek after for an undertaking a project-oriented organization. Furthermore, its intricacy and trouble are significant to the point that the conveyances are plainly unsuitable. Confusion and other autonomous reports show achievement rates under 32% with a deviation in time and cost a few times higher than introductory estimation (Standish Group, 2010).

Conventional project management, paying little mind to segment, recognizes a progression of stages, for example,

  1. Initiation, where requirements are recognized and assessed to know whether it is conceivable to complete the task at this stage. Instability is high because of absence of exact data, which implies that the likelihood of mistake in the appraisal is high.
  2. Planning, this expects to build up an answer in more prominent point of interest, by separating the issue into more nitty gritty exercises. This lessens vulnerability and makes gauges and figures. You should likewise characterize the assignments and logbook and appraisal the time and cash expected to embrace the project.
  3. Execution. When errands are plainly characterized, the execution period of the undertaking can start with the utilization of observing procedures and adjustments to arranging, keeping in mind the end goal to keep up control on the task. At this stage decreasing vulnerability is basic, the danger of an erroneous evaluation and the effect of this is much higher on the grounds that there is no opportunity to fathom the deviations.
  4. Conclusion. At long last there is the end phase of the task in which results are checked to figure out whether the venture fulfills the requirements for which emerged, and in addition gathering data on the issues recognized, shortcoming or quality of the group. This is called lessons learned and must be a wellspring of data that is put away to be the premise on which choices are made in future projects.

Project Managers need to manage those issues with a constrained arrangement of tools yet it has been demonstrated that better estimation levels at any stage and right post-mortem examination are the most affecting strategies for a nonstop change. Also, that is just conceivable utilizing examination of information from past activities. Amid the improvement of ventures, altogether different wellsprings of information can give data about what is occurring, including delays, over-burdens, and so on. That information could be utilized for quick investigation and adjustment however it can be remarkably helpful for a worldwide better execution of whatever remains of ventures in which the association is included. An organized vault of the information will be an ideal wellspring of key data for future achievement. The dataset is a preview that characterizes the conduct of the portfolio to be utilized for posthumous examination to dissect slants and produce models that characterize the conduct of certain basic components in the ventures as assessing the normal danger or exertion.

The data to be gathered must originate from each stage in the venture: initiation, planning, execution and conclusion. A few issues may emerge in the information accumulation stage since each task is extraordinary by definition, along these lines the sorts of information, fields or pointers to be put away might be diverse relying upon the venture, in this manner creating an extremely heterogeneous yet less steady information set.

In any case, the stage which will profit more from the execution of data mining systems is the underlying initial planning stage. Since at this phase there is very little point by point data on the result of the task, the venture supervisor may commit greater errors in the estimations about costs, endeavors, time or hazard likelihood.

Data mining can be useful in all stages and fields: assessing better costs, optimizing the offers, assessing the risks, diminishing the vulnerability in the span of tasks, and so on.

Given the instance of study is displayed as a data mining application process, it has been viewed as the utilization of one of the all the more far reaching techniques: Cross Industry Standard Process for Data Mining CRISP-DM . This has as of now been utilized as a part of request to take care of comparative issues

This methodology characterizes the information mining life cycle process; it comprises of 6 stages, this is a worldwide procedure which is performed by procedure iterations, moreover, the stages interface with each other all through the improvement procedure.

The underlying stage is characterized as data Understanding and expects to recognize the target, which will be characterized from a business point of view, which likewise needs to survey the circumstance and outline an arrangement of data mining project.

The following stride is characterized as Data Understanding and its points are to gather and survey information; this starts with an underlying dataset that is handled to get acquainted with the information, playing out the principal contact with the issue, finding information quality issues, distinguishing the primary speculation and characterizing introductory connections.

At the point when the Understanding stride is finished, CRISP-DM proposes another progression for get ready information for resulting demonstrating. The information readiness stage has every one of the exercises fundamental for building the last dataset, its will likely choose and clean the information. This stage can be played out a few times. This assignment incorporates the choice of lines and credits and information cleaning to fit in with necessities of utilized displaying devices. It ought to be borne as a top priority that every demonstrating method requires a specific information sort or a readiness adjusted to its needs. Along these lines it needs to perform changes on the qualities, for example, changing over numerical qualities to ostensible or something else, handling missing qualities, distinguishing anomalies, decreasing the extent of variables or tests, and so forth. This stage is firmly identified with the accompanying displaying and there is much cooperation between them

The following stage is the modeling stage, at this stage the modeling system that best fits to study prerequisites ought to be chosen and its parameters are adjusted to ideal worth. Accordingly, stepping back to the data preparation stage could be regularly vital.

After the modeling stage, it needs to play out the assessment stage. The certainty degree that guarantee as legitimate model has been set from the earliest starting point. It must figure out if the business issue has been adequately determined.

The data mining study is not finished in the assessment stage, but rather it needs to proceed with a sending arrangement and consequent checking and upkeep of model results.

This entire procedure is iterative, since it creates a cycle that is rehashed until the criteria of accomplishment is met, i.e. on the off chance that the destinations are not met at the assessment stage, it needs to do another cycle, for which you need to build up another information set or to characterize new introductory targets.

In the methodology stage, targets and techniques of data gathering must be settled. It has been utilized a current existing dataset, whose gathering is endorsed by a universal prestigious association as ISBSG, International Software Benchmarking Standards Group. Release 10 has been utilized as instance of study.

Data mining strategies are perfect with all project management systems, which intend to gather data and markers from the project closure for its later post-mortem analysis with the point of persistent improvement. An application process of data mining methods to project management, and in this manner it might be likewise connected to a more wide segment, it is not confined just to software projects. Data mining also play a good role in project management