Over recent years, companies have been capturing increasing volumes of raw data from their operational systems, holding it in data warehouses and using it to forecast trends and support decision making. Data mining systems provide the intelligence to analyse this vast quantity of raw records, extract patterns and convert the data into actionable information.
Commercial data mining has really taken off over the last decade due to several factors:
•Large volumes of data are being produced via automated data capture systems, so more sophisticated analytical software is needed in order to extract the patterns from these datasets.
•Computing power has become more affordable, enabling companies to invest in powerful data warehouse systems that provide excellent environments for data mining.
•Interest in customer relationship management (CRM) is strong and companies are realising the central importance of their customers and the value of their data.
•Commercial data mining products have become available, drawing on techniques from statistics, artificial intelligence and machine learning.
Data mining has come to be associated with large amounts of data - usually more than 100,000 records and often millions – which have thousands of associated attributes or variables. Such datasets typically occur in sectors such as financial services, retail, manufacturing, telecoms, travel, transportation and the public sector where organisations have large customer bases and customers make multiple transactions, sometimes on a minute-by-minute basis.
Due to the size of such databases, sophisticated tools are required to discover useful patterns from the vast number of potential relationships – hence the role for data mining in order to help with this task.
Over the last few years, a wider set of activities known as “advanced analytics” has emerged. Forrester Research defines this as: “Any solution that supports the identification of meaningful patterns and correlations among variables in complex, structured and unstructured, historical, and potential future data sets for the purposes of predicting future events and assessing the attractiveness of various courses of action. Advanced analytics typically incorporate such functionality as data mining, descriptive modelling, econometrics, forecasting, operations research optimisation, predictive modelling, simulations, statistics and text analytics.”
Such methods broaden the use of sophisticated analytics beyond conventional data mining into areas such as deriving new data by analysing unstructured text (text mining) or by extracting relationships between records (social network analysis).
The main types of data mining models
The process of pattern discovery when mining a dataset is known as “analytical modeling” in order to create a data mining model. This activity involves identifying meaningful relationships between variables in the data and employing those relationships to create predictive or descriptive models. The outcome is expressed as a formula or algorithm which can calculate a score (predicted value or probability) for instance of response, defection or repeat sales, according to the data values for that record.
There are two main types of data mining model:
Predictive model – a model constructed to predict a particular outcome or target variable. Commonly-used predictive modeling techniques include multiple regression (for predicting value data), logistic regression (for response prediction) and decision trees (for rule-based value or response models).
Descriptive model – a model that gives a better understanding of the data, without any single specific target variable. Commonly-used descriptive techniques include factor analysis (to extract underlying dimensions from multivariate data), cluster analysis (for grouping a customer database into segments) and association analysis (for discovering relationships between items such as retail products).
A wide range of analytical techniques are available for predictive and descriptive modeling, drawn from the worlds of statistics and machine learning. Rexer Analytics has identified the core techniques used by most data miners as regression analysis, decision trees and cluster analysis.
Data mining v statistical models
As data miners often employ statistical techniques, such as regression analysis, it may be thought that data mining is a simply a modern term for “statistical analysis”. However, this is not the case for a number of reasons.
The development of statistical theory has its roots in the late 19th and early 20th centuries before the advent of computer technology. Methods were required for making inferences about populations based on relatively small samples. The theory was developed for testing hypotheses and measuring significance of results, taking sample size into account, since analysis at population level was not a viable possibility. At the same time, the number of records and the number of attributes for which measurements were recorded were sufficiently small to enable each variable to be examined individually and transformed as appropriate for analysis and modelling purposes.
Data mining, on the other hand, is applied to databases that typically hold an entire population of customers, together with thousands of variables that summarise their transactional behaviour, payments history, campaign responses and so on. Therefore, in any project, the data miner is no longer restricted to working with small samples – the full customer base is available if desired.
However, this requires some differences in approach from traditional statistical methods. These may give misleading results if applied to a vast sample size, such as the risk of over-fitting the model or producing unhelpful results in which every variable appears to be statistically significant.
Furthermore, in data mining the dataset is liable to contain a huge number of candidate predictor attributes, such as volumes and values of transactions by product, channel, brand or period – far too many to be individually assessed and transformed manually. Data mining solutions ideally provide automated tools for selecting relevant attributes and recoding them for use in analysis.
A further key difference is that statistical analysis will aim to identify a model which is statistically significant – ie, one that outperforms a random prediction – based on a set of significant predictor variables. However, this provides no guarantee that the model will perform sufficiently well to be of business value.
Data mining goes further by including diagnostic results to indicate likely business benefits from the model. This assessment is produced by using two methods in combination:
The “hold out” sample - prior to modelling, a random subset of data is excluded from the analysis for use in evaluating the power of the model developed on the remainder of the data. This excluded subset is known as a ‘hold-out’ sample and is more likely to give a fair indication of model performance than if the model development sample were used.
Charting models - various types of tables and charts are produced in order to assess the predictive power of the model using the hold-out sample. For example, if a model has been built to predict campaign response, then the lift chart will show how response rate varies by deciles. This will help the users to decide whether the model is likely to deliver enough benefit to justify its deployment and select the model deciles that should be targeted.
Having built and evaluated a data mining model on a sample dataset, the model will be deployed by applying the scoring algorithm to all records in the customer database. Therefore facilities for large scale model deployment are essential.
Both data mining and statistical analysis require that the data is organised as a simple rectangular table, where the rows represent individuals and the columns contain structured variables (eg, demographics, usage or purchasing behaviour). The variables in this dataset are structured, in the sense that each column contains either numeric or character (categorical) values coded in a consistent format.
However, an increasing amount of information is captured nowadays in an unstructured form, for example customer comments, accident reports and e-mail requests. A technique known as text mining may be used to read unstructured data and derive facts that can be represented by structured variables and included in analytic datasets.
Text mining and social network analysis
Unstructured information cannot be directly entered into a data mining tool. However, if it can be text mined in order to derive structured variables, then these can be included. Text mining is the discovery of previously unknown information or concepts from text files by an automatic extraction process. For example, text mining (in conjunction with data mining) could be used to identify that certain words used by insurance claimants had a particularly high association with a fraudulent claim. Text mining solutions typically use linguistic analysis to extract facts from unstructured text.
Social network analysis (SNA) identifies groups of people who are connected together in some way, eg, they tend to interact or communicate with one another. It does this by applying network theory concepts such as “nodes” and “links” – “nodes” are the individuals within the networks, while “links” are the relationships between those individuals.
Social network analysis has been gaining traction over the last few years, as analytics users have been starting to learn that SNA metrics are correlated with customer loyalty. For example, in the mobile phone sector, SNA can identify the members of each group or “calling circle”, determine the central communicator or “key influencer” and extract various metrics about the strength of relationship within the group.
If the mobile operator is concerned with spreading marketing offers by word of mouth, then these key influencers will be the best people to inform. Likewise a good predictor of defection may be that a subscriber is in frequent contact with a person who themselves has recently defected. Therefore, SNA can extract potentially useful new variables about the size, strength and composition of each customer’s calling circle, for use in data mining projects such as churn prediction.
Predictive and descriptive models – how do they fit together?
Predictive models may be used to identify which individual customers are more likely to respond to marketing offers, use particular channels, pay off loans early (or late) and so on. Descriptive models, such as segmentations, may be used to identify the existence of different groups of customers, differentiable perhaps by motivations and needs, for which different marketing strategies and communicate plans may be appropriate .
Both predictive and descriptive models may be developed either at individual customer level or on an area basis, as in the case of geodemographic segmentation systems such as ACORN and MOSAIC. The understanding of the business problem and intended use for the solution will drive this choice.
It is possible that the eventual business use will require each customer to be scored on multiple models, and then a decision made according to their set of scores. Use of multiple models could occur in a number of situations, for example:
•A company which sends offers to mail order shoppers may wish to target customers based on a combination of propensity to respond and predicted order value, in order to maximise the value of goods ordered.
•A phone company that uses predictive models to identify likely churners may overlay future customer value forecasts, in order to focus its retention resources on “at risk” customers who would be valuable if they stayed.
•A bank may have different alternative product offers which could be sent to each existing current account customer and will need to analyse the customer’s propensities to take each of those products in order to decide on an optimised action with greatest predicted return. At the same time, the analysis may depend on the customer’s segment – which may determine the types of products that should be offered to someone with a given set of likely needs.
The third example situation is clearly more complex than the others - it requires a higher level analytical solution that receives predictive and descriptive data mining model scores as its inputs and will generate an optimal product offer decision for each customer. This type of solution is known as contact optimisation.
The “shelf life” of a model
The “shelf life” or likely lifetime of a model depends primarily on the extent of change over time in the relationship between the characteristics of the model and the behaviour being predicted. Where the characteristics are fairly stable, a shelf life of several years may be achieved. However, any significant shift in this relationship would probably imply that the model should be redeveloped.
For example, suppose that a mobile phone operator has been using a model developed four years ago in order to target “smart phones” to its subscribers. The model would have performed well for the first two or three years, when smart phones were primarily aimed at business customers. But their market has been broadened to domestic customers during the last year or so. In other words, the model now needs to be redeveloped in order to target the wider audience for this product.
The best practice way to identify when a model has reached the end of its shelf life is to include a randomly-selected control group in all campaigns targeted using the model. This will enable model effectiveness to be calculated – and hence the return on investment from data mining - by comparing response rates between the target and control groups.
Any major shift or trend in model performance may therefore be identified and should be investigated – causes could range from operational errors (eg, unexpected changes to model input data) through to market shifts, as in the smart phone example. Although including random, less well targeted customers in the campaign might result in some loss of sales, this should be seen as a necessary investment and an essential part of the data mining process.
Another reason for model redevelopment is if a new data source becomes available which will significantly improve the discriminatory power of the model. For example, if details of customer internet and email usage were obtained, these would be excellent predictors of need for a smart phone and so would justify rebuilding the model in order to include those variables.
In a market that’s more dynamic, with constantly changing factors or effects, the model shelf life may be considerably shorter. This implies that the modeling phase may need to be considerably faster and linked to the deployment process in a more automated way, so model automation would become a priority. Some data mining software products are particularly well suited to such requirements, and to creating models that are disposable and replaceable without great cost. These tools also enable users to build intermediate models that can be deployed quickly and act as a stop gap, while more sophisticated model development is under way.
Similarly, model automation is a desirable feature if an organisation needs to develop a large number of models. For example, if a business decided that it required separate models for all combinations of six products by four channels and ten regions, then 240 separate models would have to be built.
Data mining and related software products have been increasing in power and complexity over the last ten years, due to developments such as in-database processing, mechanisms for communicating model algorithms between tools, and systems for analysing non-structured data.
It is important to keep the business requirement and method of model deployment in mind when selecting a data mining toolset. Potential software buyers should evaluate the products that they shortlist before making purchase decisions. No single product works best in all scenarios and market sectors. Therefore users should be prepared to “mix and match” between toolsets, ensuring that models may be communicated between them as required.
Ultimately, business value is only generated when models are deployed as part of a process, which includes continuous monitoring, evaluation, learning and refinement.
A longer version of this article appeared as “An introduction to data mining and other techniques for advanced analytics” in the Journal of Direct, Data and Digital Marketing Practice, Vol. 12 No 2, October-December 2010.