Machine Learning

A 2-Day Seminar Taught by Kevin Grimm, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. These methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.

While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.

The first day starts with an overview of machine learning and continues with an introduction to the basic techniques. Topics for day 1 include cross-validation, multiple regression, and basic variable selection methods, as well as an overview of the R statistical framework. The second day focuses on advanced variable selection methods for regression analysis. Topics include multivariate adaptive regression splines, lasso regression, classification and regression trees, bagging, and random forests. Throughout the course, participants gain experience with these methods through hands-on exercises.


This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.

Who should attend? 

If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.

LOCAtion, Format, And Materials 

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Temple University Center City, 1515 Market Street, Philadelphia, PA 19103. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995 includes all course materials. The early registration fee of $895 is available until May 26.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions 

A block of guest rooms has been reserved at the Club Quarters Hotel, 1628 Chestnut Street, Philadelphia, PA at a special rate of $164 per night. This location is about a 5-minute walk to the seminar location. In order to make reservations, call 203-905-2100 during business hours and identify yourself by using group code STH624 or click here. For guaranteed rate and availability, you must reserve your room no later than Monday, May 25, 2020.

If you need to make reservations after the cut-off date, you may call Club Quarters directly and ask for the “Statistical Horizons” rate (do not use the code or mention a room block) and they will try to accommodate your request.


1. Introduction to Machine Learning
     a. Introduction to machine learning
     b. Introduction to R
     c. Single predictor regression models & cross-validation
     d. Multiple regression
     e. Best subsets regression & forward selection

2. Advanced Variable Selection
     a. Multivariate adaptive regression splines & lasso regression
     b. Review of logistic regression & decision theory
     c. Classification & regression trees
     d. Bagging trees & random forests


“The Machine Learning course was fantastic. Kevin Grimm is a fabulous instructor – great pace, plenty of examples are provided, and the slides are clear and easy to follow. I would highly recommend the course to colleagues. The materials provided for exercises and practice are plenty and instructions are very clear.”
  Grettel Castro, Florida International University

“Very informative – a great class for epidemiologists who are interested in applying machine learning techniques.”
  Gretchen Bandoli, University of California, San Diego

“Dr. Grimm is an outstanding instructor. He begins with regression content that is familiar to most people and builds to the complex machine learning topics. The examples were useful and he is able to answer questions expertly.”
  Brent Small, University of South Florida

“This is an excellent introduction to supervised machine learning models. Grimm covers a considerable number of complex topics in a very efficient way. Clearly he has been teaching this material for a long time.”
  Anibal Perez-Linan, University of Notre Dame

“This course is very informative and Kevin is a really good instructor to help people with different levels of statistical and content backgrounds to get involved in the course and build on learning.”

“Kevin Grimm was an excellent instructor with many great relatable examples and funny stories. The methods were easy to follow and intuitive, and syntax was clear. Matches the high quality offered by other Statistical Horizons courses.”
  Andy Kin On Wong, University of Toronto / University Health Network

“Good intro class to machine learning concepts. Having R scripts to take home will be helpful as I try to apply it to my institutional data.”
  Andrea Borondy Kitts, Lahey Hospital & Medical Center