Decision Trees (CART and CHAID) and Interactions
Decision trees original use in marketing was to look for interactions among variables in survey research. They are a group of related techniques that find which variables and which levels of those variables do the best job splitting objects into distinct groups based on the dependent variable. In market research objects are generally customers or survey respondents and the two most used decision tree techniques are CART and CHAID.
The case below models which demographic variables do the best job distinguishing between those people who frequently visit fast food restaurants & those that do not.
It is difficult to test for all possible interactions in a multiple regression model and this group of analyses does a good job filling that void. Decision trees also have the advantage that they are robust methods that are relatively insensitive to outliers or the distribution of data.
Marketing questions answered with decision trees:
- *Which group of consumers is most likely to exhibit a behavior, e.g. buy a new car, respond to a cross-sell campaign, default on a loan, or become a best customer.
*Which groups of consumers showed the highest profits for an offer? (This can be done by assigning costs to each response group before modeling.)
- Approaches to Decision Trees
Multivariate Approach:
- *Decision Trees (SPSS or SQL Server)
Univariate approach: Graphically displaying the results of a cross-tabulation or nested cross-tabulation
Decision trees compared to other predictive models:
*Decision trees look at variables hierarchically rather than simultaneously.
*Decision trees are easier for audiences to interpret.
*Decision tree modeling is usually faster.
*Decision trees usually produce a handful of segments (=terminal nodes) each with a given score rather than a list of *customers with individually assigned scores. In other words there are less distinct modeled values when using decision trees.
*Decision trees do not assume that the dependent variable follows any given distribution (they are non-parametric models).
- Common Mistakes
- *Not looking at all the splits that lead to a terminal node of interest. Using the example of fast visits again, there is a difference between married households with 2 or more children (both splits) and households with 2 or more children (just the terminal split).
*Simplifying a rule during socialization of the model.
*Assuming the default tree produced is the optimal tree.
*Not testing to see if the tree is over-fit to the sample data.
- Potential Challenges
- *Lots of recoding and exploring to modify the tree splits. Frequently the second or third best variable for a given split end up being more actionable with little loss in model performance.
*Decision trees can handle missing values, but each method handles missing values differently. Missing values frequently make interpretation more challenging.
*Models should be validated (split-file method in addition to pruning) to ensure they are not over-fit.
*Models can be sensitive to stopping criteria, so it is preferable to see if multiple methods give similar results.
What is the M Squared Group seasoning?
- *Use as a form of explanatory predictive modeling, to ensure there are no major interactions before performing another modeling technique.
*Use to find optimal splits for nominal variables before recoding for regression modeling or other techniques.
*Bring together disparate data sources to gather more information before using decision trees for prediction
Questions to discuss with M Squared Group
- *What is the single largest problem the prospect hopes to solve with decision trees? What are they trying to predict or split into groups?
*How do they plan to apply the scoring?
*How important is it for the answer to make sense to a wider audience? What is more important the accuracy of the answer or the ability to interpret it?
*Do they have any ideas as to which variables/data sources they plan to use to separate out the variable of interest?