• Preface
  • 1 Introduction
    • 1.1 A Simple Example
    • 1.2 Important Concepts
      • 1.2.1 Overfitting
      • 1.2.2 Supervised and Unsupervised Procedures
      • 1.2.3 No Free Lunch
      • 1.2.4 The Model versus the Modeling Process
      • 1.2.5 Model Bias and Variance
      • 1.2.6 Experience-Driven Modeling and Empirically Driven Modeling
      • 1.2.7 Big Data
    • 1.3 A More Complex Example
    • 1.4 Feature Selection
    • 1.5 An Outline of the Book
    • 1.6 Computing
  • 2 Illustrative Example: Predicting Risk of Ischemic Stroke
    • 2.1 Splitting
    • 2.2 Preprocessing
    • 2.3 Exploration
    • 2.4 Predictive Modeling Across Sets
    • 2.5 Other Considerations
    • 2.6 Computing
  • 3 A Review of the Predictive Modeling Process
    • 3.1 Illustrative Example: OkCupid Profile Data
    • 3.2 Measuring Performance
      • 3.2.1 Regression Metrics
      • 3.2.2 Classification Metrics
      • 3.2.3 Context-Specific Metrics
    • 3.3 Data Splitting
    • 3.4 Resampling
      • 3.4.1 V-Fold Cross-Validation and Its Variants
      • 3.4.2 Monte Carlo Cross-Validation
      • 3.4.3 The Bootstrap
      • 3.4.4 Rolling Origin Forecasting
      • 3.4.5 Validation Sets
      • 3.4.6 Variance and Bias in Resampling
      • 3.4.7 What Should Be Included Inside of Resampling?
    • 3.5 Tuning Parameters and Overfitting
    • 3.6 Model Optimization and Tuning
    • 3.7 Comparing Models Using the Training Set
    • 3.8 Feature Engineering Without Overfitting
    • 3.9 Summary
    • 3.10 Computing
  • 4 Exploratory Visualizations
    • 4.1 Introduction to the Chicago Train Ridership Data
    • 4.2 Visualizations for Numeric Data: Exploring Train Ridership Data
      • 4.2.1 Box Plots, Violin Plots, and Histograms
      • 4.2.2 Augmenting Visualizations through Faceting, Colors, and Shapes
      • 4.2.3 Scatter Plots
      • 4.2.4 Heatmaps
      • 4.2.5 Correlation Matrix Plots
      • 4.2.6 Line Plots
      • 4.2.7 Principal Components Analysis
    • 4.3 Visualizations for Categorical Data: Exploring the OkCupid Data
      • 4.3.1 Visualizing Relationships between Outcomes and Predictors
      • 4.3.2 Exploring Relationships Between Categorical Predictors
    • 4.4 Post Modeling Exploratory Visualizations
    • 4.5 Summary
    • 4.6 Computing
  • 5 Encoding Categorical Predictors
    • 5.1 Creating Dummy Variables for Unordered Categories
    • 5.2 Encoding Predictors with Many Categories
    • 5.3 Approaches for Novel Categories
    • 5.4 Supervised Encoding Methods
    • 5.5 Encodings for Ordered Data
    • 5.6 Creating Features from Text Data
    • 5.7 Factors versus Dummy Variables in Tree-Based Models
    • 5.8 Summary
    • 5.9 Computing
  • 6 Engineering Numeric Predictors
    • 6.1 1:1 Transformations
    • 6.2 1:Many Transformations
      • 6.2.1 Nonlinear Features via Basis Expansions and Splines
      • 6.2.2 Discretize Predictors as a Last Resort
    • 6.3 Many:Many Transformations
      • 6.3.1 Linear Projection Methods
      • 6.3.2 Autoencoders
      • 6.3.3 Spatial Sign
      • 6.3.4 Distance and Depth Features
    • 6.4 Summary
    • 6.5 Computing
  • 7 Detecting Interaction Effects
    • 7.1 Guiding Principles in the Search for Interactions
    • 7.2 Practical Considerations
    • 7.3 The Brute-Force Approach to Identifying Predictive Interactions
      • 7.3.1 Simple Screening
      • 7.3.2 Penalized Regression
    • 7.4 Approaches when Complete Enumeration is Practically Impossible
      • 7.4.1 Guiding Principles and Two-stage Modeling
      • 7.4.2 Tree-based Methods
      • 7.4.3 The Feasible Solution Algorithm
    • 7.5 Other Potentially Useful Tools
    • 7.6 Summary
    • 7.7 Computing
  • 8 Handling Missing Data
    • 8.1 Understanding the Nature and Severity of Missing Information
    • 8.2 Models that are Resistant to Missing Values
    • 8.3 Deletion of Data
    • 8.4 Encoding Missingness
    • 8.5 Imputation methods
    • 8.6 Special Cases
    • 8.7 Summary
    • 8.8 Computing
  • 9 Working with Profile Data
    • 9.1 Illustrative Data: Pharmaceutical Manufacturing Monitoring
    • 9.2 What are the Experimental Unit and the Unit of Prediction?
    • 9.3 Reducing Background
    • 9.4 Reducing Other Noise
    • 9.5 Exploiting Correlation
    • 9.6 Impacts of Data Processing on Modeling
    • 9.7 Summary
    • 9.8 Computing
  • 10 Feature Selection Overview
    • 10.1 Goals of Feature Selection
    • 10.2 Classes of Feature Selection Methodologies
    • 10.3 Effect of Irrelevant Features
    • 10.4 Overfitting to Predictors and External Validation
    • 10.5 A Case Study
    • 10.6 Next Steps
    • 10.7 Computing
  • 11 Greedy Search Methods
    • 11.1 Illustrative Data: Predicting Parkinson’s Disease
    • 11.2 Simple Filters
      • 11.2.1 Simple Filters Applied to the Parkinson’s Disease Data
    • 11.3 Recursive Feature Elimination
    • 11.4 Stepwise Selection
    • 11.5 Summary
    • 11.6 Computing
  • 12 Global Search Methods
    • 12.1 Naive Bayes Models
    • 12.2 Simulated Annealing
      • 12.2.1 Selecting Features without Overfitting
      • 12.2.2 Application to Modeling the OkCupid Data
      • 12.2.3 Examining Changes in Performance
      • 12.2.4 Grouped Qualitative Predictors Versus Indicator Variables
      • 12.2.5 The Effect of the Initial Subset
    • 12.3 Genetic Algorithms
      • 12.3.1 External Validation
      • 12.3.2 Coercing Sparsity
    • 12.4 Test Set Results
    • 12.5 Summary
    • 12.6 Computing
  • References
  • Errata and Version History

Feature Engineering and Selection: A Practical Approach for Predictive Models

11 Greedy Search Methods

In this chapter, greedy search methods such as simple univariate filters and recursive feature elimination are discussed. Before proceeding, another data set is introduced that will be used in this chapter to demonstrate the strengths and weaknesses of these approaches.