11 Feature Selection

Abdeldayem, Emad H, Ahmed S Ibrahim, Amr M Ahmed, Eman S Genedi, and Wahid H Tantawy. 2015. “Positive Remodeling Index by Msct Coronary Angiography: A Prognostic Factor for Early Detection of Plaque Rupture and Vulnerability.” The Egyptian Journal of Radiology and Nuclear Medicine 46 (1). Elsevier:13–24.

Abdi, H, and L Williams. 2010. “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (4):433–59.

Agresti, Alan. 2012. Categorical Data Analysis. Wiley-Interscience.

Altman, D. 1991. “Categorising Continuous Variables.” British Journal of Cancer, no. 5:975.

Altman, D, B Lausen, W Sauerbrei, and M Schumacher. 1994. “Dangers of Using "Optimal" Cutpoints in the Evaluation of Prognostic Factors.” Journal of the National Cancer Institute 86 (11):829–35.

Altman, DG, and JM Bland. 1994a. “Diagnostic tests 3: receiver operating characteristic plots.” BMJ: British Medical Journal 309 (6948):188.

———. 1994b. “Statistics Notes: Diagnostic tests 2: predictive values.” British Medical Journal 309 (6947):102.

Amati, Gianni, and Cornelis Joost Van Rijsbergen. 2002. “Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness.” ACM Transactions on Information Systems 20 (4):357–89.

Benavoli, Alessio, Giorgio Corani, Janez Demsar, and Marco Zaffalon. 2016. “Time for a Change: A Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis.” arXiv.org.

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 289–300.

Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13:281–305.

Bishop, C. 2011. Pattern Recognition and Machine Learning. Springer.

Boulesteix, Anne-Laure, and Carolin Strobl. 2009. “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High-Dimensional Prediction.” BMC Medical Research Methodology 9 (1):85.

Box, George EP, William Gordon Hunter, and J Stuart Hunter. 2005. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. Wiley.

Box, GEP, and D Cox. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological), 211–52.

Breiman, L., J. Friedman, R. Olshen, and C. Stone. 1984. Classification and Regression Trees. New York: Chapman; Hall.

Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24 (2). Springer:123–40.

———. 2001. “Random Forests.” Machine Learning 45 (1). Springer:5–32.

Caputo, B, K Sim, F Furesjo, and A Smola. 2002. “Appearance-Based Object Recognition Using Svms: Which Kernel Should I Use?” In Proceedings of Nips Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision. Vol. 2002.

Chollet, F, and JJ Allaire. 2018. Deep Learning with R. Manning.

Chong, Edwin K. P., and Stanislaw H. Żak. 2008. “Global Search Algorithms.” In An Introduction to Optimization, 267–95. John Wiley & Sons, Inc.

Christopher, D Manning, Raghavan Prabhakar, and Schutze Hinrich. 2008. Introduction to Information Retrieval. Cambridge University Press.

Cilla, M, E Pena, MA Martinez, and DJ Kelly. 2013. “Comparison of the Vulnerability Risk for Positive Versus Negative Atheroma Plaque Morphology.” Journal of Biomechanics 46 (7). Elsevier:1248–54.

Cleveland, W. 1979. “Robust Locally Weighted Regression and Smoothing Scatterplots.” Journal of the American Statistical Association 74 (368):829–36.

Cleveland, William S. 1993. Visualizing Data. Summit, New Jersey: Hobart Press.

Cover, T, and J Thomas. 2012. Elements of Information Theory. John Wiley; Sons.

Davison, Anthony Christopher, and David Victor Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge University Press.

Demsar, Janez. 2006. “Statistical Comparisons of Classifiers over Multiple Data Sets.” Journal of Machine Learning Research 7 (Jan):1–30.

Dillon, William R, and Matthew Goldstein. 1984. Multivariate Analysis Methods and Applications. Wiley.

Efron, B. 1983. “Estimating the error rate of a prediction rule: improvement on cross-validation.” Journal of the American Statistical Association, 316–31.

Efron, B, and R Tibshirani. 1997. “Improvements on cross-validation: The 632+ bootstrap method.” Journal of the American Statistical Association, 548–60.

Efron, Bradley, and Trevor Hastie. 2016. Computer Age Statistical Inference. Cambridge University Press.

Eilers, P, and B Marx. 2010. “Splines, Knots, and Penalties.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (6):637–53.

Fernandez-Delgado, Manuel, Eva Cernadas, Senen Barro, and Dinani Amorim. 2014. “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?” Journal of Machine Learning Research 15 (1):3133–81.

Fogel, P, D Hawkins, C Beecher, G Luta, and S Young. 2013. “A Tale of Two Matrix Factorizations.” The American Statistician 67 (4):207–18.

Friedman, J. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics 19 (1):1–141.

Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis 38 (4). Elsevier:367–78.

Friendly, Michael, and David Meyer. 2015. Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data. CRC Press.

Frigge, Michael, David C Hoaglin, and Boris Iglewicz. 1989. “Some Implementations of the Boxplot.” The American Statistician 43 (1). Taylor & Francis:50–54.

Ghosh, A K, and P Chaudhuri. 2005. “On Data Depth and Distribution-Free Discriminant Analysis Using Separating Surfaces.” Bernoulli 11 (1):1–27.

Gillis, N. 2017. “Introduction to Nonnegative Matrix Factorization.” arXiv Preprint arXiv:1703.00663.

Golub, G, M Heath, and G Wahba. 1979. “Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics 21 (2):215–23.

Goodfellow, I, Y Bengio, and A Courville. 2016. Deep Learning. MIT Press.

Goodfellow, I, Y Bengio, A Courville, and Y Bengio. 2016. Deep Learning. MIT press Cambridge.

Greenacre, M. 2010. Biplots in Practice. Fundacion BBVA.

Greenacre, Michael. 2017. Correspondence Analysis in Practice. CRC press.

Haase, Richard F. 2011. Multivariate General Linear Models. Sage.

Hampel, D.F., P.J. Andrews, F.R. Bickel, P.J. Rogers, W.H. Huber, and J.W. Turkey. 1972. Robust Estimates of Location. Princeton, New Jersey: Princeton University Press.

Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. 2015. Statistical Learning with Sparsity. CRC press.

Hill, A, P LaPan, Y Li, and S Haney. 2007. “Impact of Image Segmentation on High-Content Screening Data Quality for SK-BR-3 Cells.” BMC Bioinformatics 8 (1):340.

Hintze, Jerry L, and Ray D Nelson. 1998. “Violin Plots: A Box Plot-Density Trace Synergism.” The American Statistician 52 (2). Taylor & Francis Group:181–84.

Hosmer, David, and Stanley Lemeshow. 2000. Applied Logistic Regression. 2nd ed. New York: John Wiley & Sons.

Hothorn, T, F Leisch, A Zeileis, and K Hornik. 2005. “The Design and analysis of benchmark experiments.” Journal of Computational and Graphical Statistics 14 (3):675–99.

Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3). Taylor & Francis:651–74.

Hyndman, R, and G Athanasopoulos. 2013. Forecasting: Principles and Practice. OTexts.

Hyvarinen, A, and E Oja. 2000. “Independent Component Analysis: Algorithms and Applications.” Neural Networks 13 (4-5). Elsevier:411–30.

I, Goodfellow., Y Bengio, and A Courville. 2016. Deep Learning. MIT Press.

Jahani, Meysam, and Mahdi Mahdavi. 2016. “Comparison of Predictive Models for the Early Diagnosis of Diabetes.” Healthcare Informatics Research 22 (2):95–100.

Jones, Donald R, Matthias Schonlau, and William J Welch. 1998. “Efficient Global Optimization of Expensive Black-Box Functions.” Journal of Global Optimization 13 (4). Springer:455–92.

Karthikeyan, M, R Glen, and A Bender. 2005. “General Melting Point Prediction Based on a Diverse Compound Data Set and Artificial Neural Networks.” Journal of Chemical Information and Modeling 45 (3):581–90.

Kenny, P, and C Montanari. 2013. “Inflation of Correlation in the Pursuit of Drug-Likeness.” Journal of Computer-Aided Molecular Design 27 (1):1–13.

Kim, Albert Y, and Adriana Escobedo-Land. 2015. “OkCupid Data for Introductory Statistics and Data Science Courses.” Journal of Statistics Education 23 (2):1–25.

Kuhn, Max. 2008. “The caret Package.” Journal of Statistical Software 28 (5):1–26.

Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer.

Kvalseth, T. 1985. “Cautionary Note About \(R^2\).” American Statistician 39 (4):279–85.

Lawrence, I, and Kuei Lin. 1989. “A Concordance Correlation Coefficient to Evaluate Reproducibility.” Biometrics, 255–68.

Lee, T-W. 1998. Independent Component Analysis. Springer.

Levinson, Mark M, and DI Rodriguez. 1998. “Endarterectomy for Preventing Stroke in Symptomatic and Asymptomatic Carotid Stenosis. Review of Clinical Trials and Recommendations for Surgical Therapy.” In The Heart Surgery Forum, 147–68.

Lewis, David D, Yiming Yang, Tony G Rose, and Fan Li. 2004. “Rcv1: A New Benchmark Collection for Text Categorization Research.” Journal of Machine Learning Research 5:361–97.

Lian, Kevin, Jeremy H White, Eric S Bartlett, Aditya Bharatha, Richard I Aviv, Allan J Fox, and Sean P Symons. 2012. “NASCET Percent Stenosis Semi-Automated Versus Manual Measurement on Cta.” The Canadian Journal of Neurological Sciences 39 (03). Cambridge Univ Press:343–46.

Luo, Gang. 2016. “Automatically Explaining Machine Learning Prediction Results: A Demonstration on Type 2 Diabetes Risk Prediction.” Health Information Science and Systems 4 (1):2.

MacKay, David JC. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press.

Massy, William F. 1965. “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association 60 (309). Taylor & Francis:234–56.

McElreath, R. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Racon Hall: Chapman; Hall.

Meier, Pascal, Guido Knapp, Umesh Tamhane, Seemant Chaturvedi, and Hitinder S Gurm. 2010. “Short Term and Intermediate Term Comparison of Endarterectomy Versus Stenting for Carotid Artery Stenosis: Systematic Review and Meta-Analysis of Randomised Controlled Clinical Trials.” BMJ 340. British Medical Journal Publishing Group:c467.

Mockus, Jonas. 1994. “Application of Bayesian Approach to Numerical Methods of Global and Stochastic Optimization.” Journal of Global Optimization 4 (4). Springer:347–65.

Mozharovskyi, P, K Mosler, and T Lange. 2015. “Classifying Real-World Data with the Dd \(\alpha\)-Procedure.” Advances in Data Analysis and Classification 9 (3). Springer:287–314.

Nair, V, and G. Hinton. 2010. “Rectified Linear Units Improve Restricted Boltzmann Machines.” In Proceedings of the 27th International Conference on Machine Learning, edited by Johannes Fürnkranz and Thorsten Joachims, 807–14. Omnipress.

Neter, John, Michael H Kutner, Christopher J Nachtsheim, and William Wasserman. 1996. Applied Linear Statistical Models. Vol. 4. Irwin Chicago.

Preneel, Bart. 2010. “Cryptographic Hash Functions: Theory and Practice.” In ICICS, 1–3.

Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

Raimondi, Chris. 2010. “How I Won the Predict Hiv Progression Data Mining Competition.” http://blog.kaggle.com/2010/08/09/how-i-won-the-hiv-progression-prediction-data-mining-competition/.

Reid, R. 2015. “A Morphometric Modeling Approach to Distinguishing Among Bobcat, Coyote and Gray Fox Scats.” Wildlife Biology 21 (5). BioOne:254–62.

Roberts, S, and R Everson. 2001. Independent Component Analysis: Principles and Practice. Cambridge University Press.

Rousseeuw, Peter J, and Christophe Croux. 1993. “Alternatives to the Median Absolute Deviation.” Journal of the American Statistical Association 88 (424):1273–83.

Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. 1998. “Nonlinear Component Analysis as a Kernel Eigenvalue Problem.” Neural Computation 10 (5). MIT Press:1299–1319.

Serneels, S, E De Nolf, and P Van Espen. 2006. “Spatial Sign Preprocessing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling 46 (3):1402–9.

Shao, Jun. 1993. “Linear Model Selection by Cross-Validation.” Journal of the American Statistical Association 88 (422):486–94.

Shawe-Taylor, John, and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press.

Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. O’Reilly.

Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15:1929–58.

Stanković, Jelena, Ivana Marković, and Miloš Stojanović. 2015. “Investment Strategy Optimization Using Technical Analysis and Predictive Modeling in Emerging Markets.” Procedia Economics and Finance 19:51–62.

Stone, Mervyn, and Rodney J Brooks. 1990. “Continuum Regression: Cross-Validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares and Principal Components Regression.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 237–69.

Strobl, Carolin, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. 2007. “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics 8 (1):25.

Thomson, Jason, Kjell Johnson, Robert Chapin, Donald Stedman, Steven Kumpf, and Terence RS Ozolinš. 2011. “Not a Walk in the Park: The Ecvam Whole Embryo Culture Model Challenged with Pharmaceuticals and Attempted Improvements with Random Forest Design.” Birth Defects Research Part B: Developmental and Reproductive Toxicology 92 (2):111–21.

Timm, Neil H, and James E Carlson. 1975. “Analysis of Variance Through Full Rank Models.” Multivariate Behavioral Research Monographs. Society of Multivariate Experimental Psychology.

Tufte, Edward R. 1990. Envisioning Information. Cheshire, Connecticut: Graphics press.

Tukey, John W. 1977. Exploratory Data Analysis. Reading, Mass.

U.S. Energy Information Administration. 2017a. “Weekly Chicago All Grades All Formulations Retail Gasoline Prices.” https://tinyurl.com/ydctltn4.

———. 2017b. “What Drives Crude Oil Prices?” https://tinyurl.com/supply-opec.

United States Census Bureau. 2017. “Chicago Illinois Population Estimates.” https://tinyurl.com/y8s2y4bh.

Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. “Feature Hashing for Large Scale Multitask Learning.” In Proceedings of the 26th Annual International Conference on Machine Learning, 1113–20. ACM.

West, Brady T, Kathleen B Welch, and Andrzej T Galecki. 2014. Linear Mixed Models: A Practical Guide Using Statistical Software. CRC Press.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. O’Reilly. http://​r4ds.​had.​co.​nz.

Willett, Peter. 2006. “The Porter Stemming Algorithm: Then and Now.” Program 40 (3):219–23.

Wolpert, David H. 1996. “The Lack of a Priori Distinctions Between Learning Algorithms.” Neural Computation 8 (7). MIT Press:1341–90.

Wood, S. 2017. Generalized Additive Models: An Introduction with R. CRC press.

Wood, Simon. 2006. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC.

Yandell, B. 1993. “Smoothing Splines - a Tutorial.” The Statistician, 317–19.

Yeo, I-K, and R Johnson. 2000. “A New Family of Power Transformations to Improve Normality or Symmetry.” Biometrika 87 (4):954–59.