3 A Review of the Predictive Modeling Process

Before diving in to specific methodologies and techniques for modeling, there are necessary topics that should first be discussed and defined. These topics are fairly general with regards to empirical modeling and include: metrics for measuring performance for regression and classification problems, approaches for optimal data usage (including data splitting and resampling), best practices for model tuning, and recommendations for comparing model performance.

There are two data sets used in this chapter to illustrate the techniques. First is the Ames housing price data first introduced in Chapter 1. The second data set focuses on the classification of a person’s profession based on the information from an online dating site. These data are discussed in the next section.