It also enables you to assess the viability of a potential product or service before taking it to market. It is a field that recognises the importance of utilising data to make evidence based decisions and many statistical and analytical methods have become popular in the field of quantitative market research. In our Market Research terminology blog series, we discuss a number of common terms used in market research analysis and explain what they are used for and how they relate to established statistical techniques. What is it for?
|Published (Last):||24 May 2013|
|PDF File Size:||20.19 Mb|
|ePub File Size:||5.58 Mb|
|Price:||Free* [*Free Regsitration Required]|
It also enables you to assess the viability of a potential product or service before taking it to market. It is a field that recognises the importance of utilising data to make evidence based decisions and many statistical and analytical methods have become popular in the field of quantitative market research.
In our Market Research terminology blog series, we discuss a number of common terms used in market research analysis and explain what they are used for and how they relate to established statistical techniques. What is it for? CHAID Chi-square Automatic Interaction Detector analysis is an algorithm used for discovering relationships between a categorical response variable and other categorical predictor variables.
It is useful when looking for patterns in datasets with lots of categorical variables and is a convenient way of summarising the data as the relationships can be easily visualised.
In practice, CHAID is often used in direct marketing to understand how different groups of customers might respond to a campaign based on their characteristics. So suppose, for example, that we run a marketing campaign and are interested in understanding what customer characteristics e. We might find that rural customers have a response rate of only We check to see if this difference is statistically significant and, if it is, we retain these as new leaves.
Urban homeowners may have a much higher response rate At each step every predictor variable is considered to see if splitting the sample based on this factor leads to a statistically significant relationship with the response variable. Where there might be more than two groupings for a predictor, merging of the categories is also considered to find the best discrimination. If a statistically significant difference is observed then the most significant factor is used to make a split, which becomes the next branch in the tree.
The process repeats to find the predictor variable on each leaf that is most significantly related to the response, branch by branch, until no further factors are found to have a statistically significant effect on the response e.
The results can be visualised with a so-called tree diagram — see below, for example. In this case, we can see that urban homeowners An example of a CHAID tree diagram showing the return rates for a direct marketing campaign for different subsets of customers.
What statistical techniques are used? A statistically significant result indicates that the two variables are not independent, i. Chi-square tests are applied at each of the stages in building the CHAID tree, as described above, to ensure that each branch is associated with a statistically significant predictor of the response variable e. Bonferroni corrections , or similar adjustments, are used to account for the multiple testing that takes place.
The more tests that we do, the greater the chance we will find one of these false-positive results inflating the so-called Type I error , so adjustments to the p-values are used to counter this, so that stronger evidence is required to indicate a significant result. However, in this case F-tests rather than Chi-square tests are used. Continuous predictor variables can also be incorporated by determining cut-offs to create ordinal groups of variables, based, for example, on particular percentiles of the variable.
At each branch, as we split the total population, we reduce the number of observations available and with a small total sample size the individual groups can quickly become too small for reliable analysis.
Alternative methods When we are interested in identifying groups of customers for targeted marketing where we do not have a response variable on which to base the splits in our sample, we can use other market segmentation techniques such as cluster analysis see our recent blog on Customer segmentation for further information.
CHAID is sometimes used as an exploratory method for predictive modelling. However, a more formal multiple logistic or multinomial regression model could be applied instead. These regression models are specifically designed for analysing binary e. Interaction terms could be included in the model to investigate the associations between predictors that are tested for in the CHAID algorithm, whilst allowing a wider range of possible model specifications which may well fit the data better.
Another advantage of this modelling approach is that we are able to analyse the data all-in-one rather than splitting the data into subgroups and performing multiple tests.
In particular, where a continuous response variable is of interest or there are a number of continuous predictors to consider, we would recommend performing a multiple regression analysis instead.
CHAID segmentation in SAS ?
CHAID uses predictor variables e. CHAID can be performed using a variety of inputs including scales e. CHAID can only be performed if variables produce a statistically significant split in the sample. Since the sample is being repeatedly split, the technique performs best using large sample sizes. John is a marketing manager for a large multinational company and he wants to understand what characteristics his most satisfied and least satisfied customers share. The first predictor category that CHAID uses to split the sample is the predictor that is associated the most with the response variable highly satisfied customer or not , i. Here we have defined those as satisfied with product availability as giving a score of 8.
Chi-square automatic interaction detection
The dependent variable for this ordinal CHAID model is online brokerage account commission dollars during the past 12 months. Predictors include proprietary client data various account status and trading behavior variables as well as syndicated demographic and lifestyle variables. Once the model is built using the modeling sample, we apply it to the validation sample to see how well it works on a sample other than the one on which it was built. The tree is built by splitting an initial predictor variable into categories or ranges of values such that each category or range of values represents a statistically significant difference on the dependent variable. Additional predictors may be added under each category or value range of the first predictor, adding more branches to the segmentation model.