Segmentation and Cluster Analysis
Segmentation groups objects into similar groups. Objects can be stores, customers, or product features. The resulting groups contain members that are more similar to each other than they are to other groups. By definition the objects are placed in different clusters based on the variables chosen, so this method is sensitive to which variables are selected.
Common marketing problems addressed through segmentation include:
* Which of my customers have similar attitudes towards photography? How many distinct groups of customer attitudes do I have?
* Which product features are commonly ordered together? (Is there a potential for product bundling?)
* What are the groups of consumer attitudes towards casual dining restaurants?
The “best” use of segmentation in marketing is much debated. From a strategy perspective segmentation can influence product development, communications, and organizational structure. Unlike the operating level, to answer strategic level questions it frequently is not necessary to know which segments your customers are in.
Approaches to Segmentation
* Cluster Analysis
* Hierarchical cluster analysis is usually used when there is a small number of cases. This results in dendrogram (=clustering tree) which can add additional insights.
* Disaggregate or partition cluster analysis is frequently used when there are more than a few dozen cases. K-means is the most common though we also use two-step because it can automatically detect outliers.
* RFM Deciling
* Cross-tabulating (approximately) two variables, sometimes further simplified with a four-square
Approaches you will see listed for clustering customers:
1. Not clearly defining the goal of the segmentation up front. This frequently leads to a segmentation scheme that is not optimized for the most important business problem.
2. Assuming that customers segmented using one type of variables will have actionable differences for another group of variables. Clusters that have different attitudes, may only have marginally different demographics.
3. Assuming that a solution needs to be created for each segment.
*Cluster analysis is strictly an exploratory method. There is no way to test if the clusters are optimally separated or if the correct number of clusters has been found or if the correct variables were used in the clustering. A fair amount of data discovery and at least one round of client revision has been our historic norm.
*Simply adding more variables can lead to over-fitting and lower the performance of a solution.
*There is no overall measure for the fit of a cluster solution, nothing analogous to R-squared, that facilitate comparing multiple segmentation schemes.
Cluster analysis cannot handle missing values. This often requires a better understanding of the data and the use of substitution variables.
*Ideally clusters are validated. Discriminant analysis is one commonly used and accepted industry practice. If there are enough objects split file validation is another approach.
*Several methods are sensitive to outliers, so data prep and discovery is critical.
What is the M Squared Group seasoning?
*Use online panels to gather information used for clustering.
*Use factor analysis to determine how many unique components are in the data to make sure each is more equally represented, before segmentation. (Especially useful with attitudinal data.)
*Bring together disparate data sources to gather more information for clustering or to report on differences between clusters.
*Build clustering models based on more revenue generating behaviors or other characteristics. If these are not well understood, large questionnaires are frequently used so all areas (needs, attitudes, demographics, etc.) are well represented.
Questions to discuss with M Squared Group
*What is the SINGLE largest problem the prospect hopes to solve through segmentation? What changes do they plan to make once they have a segmentation solution in place?
*Approximately how many objects do they want to segment? Is it important to segment the whole database?
*Are there certain customers that will be excluded from the segmentation analysis? (e.g. for a behavior cluster model you may want to exclude new customers with less than 3 months of data)
*Do they have any ideas as to which variables/data sources they plan to use to build the segmentation? (Work load increases with nominal variables. Also may be necessary to do so form of data reduction.)
*Will they want to update the segmentation scoring? Approximately how often?