7.6 Summary

Interactions between predictors are often overlooked when constructing a predictive model. This may be due to the ability of modern modeling techniques to covertly identify the interactions. Or interactions may be overlooked due to the vast potential additional number of terms these would add to a model. This is especially true if the analyst has no knowledge of which interaction terms may be beneficial to explaining the outcome.

When beginning the search for interactions, expert knowledge about the system will always be most beneficial and can help narrow the search. Algorithmic approaches can also be employed in the search. For data that have a relatively small number of predictors, every pairwise interaction can be completely enumerated. Then resampling approaches or penalized models can be used to locate the interactions that may usefully improve a model.

As the number of predictors grows, complete enumeration becomes practically unfeasible. Instead, methods that can readily identify potentially important interactions without searching the entire space should be used. These include two-stage modeling, tree-based methods, and the feasible solution algorithm.

When the search is complete, the interaction terms that are most likely to improve model performance can be added to a simpler model like linear or logistic regression. The predictive performance can then be estimated through cross-validation and compare to models without these terms to determine the overall predictive improvement.