7.2 Practical Considerations

In the quest to locate predictive interactions, several practical considerations would need to be addressed, especially if there is little or no expert knowledge to suggest which terms should be screened first. A few of the questions to think through before discussing search methodology are: Is it possible with the available data to enumerate and evaluate all possible predictive interactions? And if so, then should all interactions be evaluated? And should interaction terms be created before or after preprocessing the original predictors?

Let’s begin with the question of whether or not all of the interaction terms should be completely enumerated. To answer this question, the principles of interaction hierarchy and effect sparsity will be used to influence the approach. While high-order interactions (three-way and above) are possible for certain problems as previously mentioned, they likely occur infrequently (effect sparsity) and may only provide a small improvement on the predictive ability of the model (interaction hierarchy). Therefore, higher-order interactions should be screened only if expert knowledge recommends investigating specific interaction terms. What is more likely to occur is that a fraction of the possible pairwise interaction terms contain predictively relevant information. Even though the recommendation here is to focus on searching through the pairwise interactions, it still may be practically impossible to evaluate a complete enumeration of these terms. More concretely, if there are \(p\) predictors, then there are \((p)(p-1)/2\) pairwise interaction terms. As the number of predictors increases, the number of interaction terms increases exponentially. With as few as 100 original predictors, complete enumeration requires a search of 4,950 terms. A moderate increase to 500 predictors requires us to evaluate nearly 125,000 pairwise terms! More strategic search approaches will certainly be required for data that contain even a moderate numbers of predictors, and will be discussed in subsequent sections.

Another practical consideration is when interaction terms should be created relative to the preprocessing steps. Recall from Chapter 6 that preprocessing steps for individual numeric predictors can include centering, scaling, dimension expansion or dimension reduction. As shown in previous chapters, these steps help models to better uncover the predictive relationships between predictors and the response. What now must be understood is if the order of operations of preprocessing and creation of interaction terms affects the ability to find predictively important interactions. To illustrate how the order of these steps can affect the relationship of the interaction with the response, consider the interaction between the maximum remodeling ratio and the maximum stenosis by area for the stroke data (Chapter 2). For these data, the preprocessing steps were centering, scaling, and individually transformed. Figure 7.6 compares the distributions of the stroke groups when the interaction term is created before the preprocessing steps (a) and after the preprocessing steps (b). The box plots between the stroke groups make it clear that the interaction’s signal, captured by the shift between group distributions, is preserved when the interaction term is first created followed by the preprocessing steps. However, the interactive predictive signal is almost completely lost when the original predictors are preprocessed prior to creating the interaction term. This case demonstrates that we should be very thoughtful as to at what step interaction terms should be created. In general, interactions are most plausible and practically interpretable on the original scales of measurement. Therefore, the interaction terms should probably be created prior to any preprocessing steps. It may also be wise to check the effect of the ordering of these steps.

An illustration of the impact of the order of operations for preprocessing predictors and creating interaction terms based on the Stroke data.  (a) Interactions terms are created then all terms are preprocessed.  (b) The predictors are first preprocessed then the interaction terms are created based on the preprocessed predictors.

Figure 7.6: An illustration of the impact of the order of operations for preprocessing predictors and creating interaction terms based on the Stroke data. (a) Interactions terms are created then all terms are preprocessed. (b) The predictors are first preprocessed then the interaction terms are created based on the preprocessed predictors.