Machine Learning

A 4-Day Remote Seminar Taught by Kevin Grimm, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. This seminar offers a thorough introduction to machine learning methods. Topics covered include: cross-validation; multiple regression; basic variable selection methods; an overview of the R statistical framework; and advanced variable selection methods for regression analysis.

Starting August 11, we are offering this seminar as a 4-day synchronous*, remote workshop. Each day will consist of a 3-hour, live morning lecture held via the free video-conferencing software Zoom. Participants are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if they are unable to attend at the scheduled time. Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on one’s own that afternoon. A final session will be held each evening as an “office hour”, where participants can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session, meaning that you will get all of the class discussion and exercise solutions even if you cannot participate synchronously.


Machine Learning methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.

While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.

The course starts with an overview of machine learning and continues with an introduction to the basic techniques. It will also focus on advanced variable selection methods for regression analysis, including multivariate adaptive regression splines, lasso regression, classification and regression trees, bagging, and random forests. Throughout the course, participants gain experience with these methods through hands-on exercises.


This remote seminar is held via Zoom, a free video conferencing application. Instructions for joining a session via Zoom are available here. Before the seminar begins, participants will receive an email with the meeting code and password you must use to join.  

This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to use a laptop computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.

WHO SHOULD Register? 

If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.


1. Introduction to Machine Learning
     a. Introduction to machine learning
     b. Introduction to R
     c. Single predictor regression models & cross-validation
     d. Multiple regression
     e. Best subsets regression & forward selection

2. Advanced Variable Selection
     a. Multivariate adaptive regression splines & lasso regression
     b. Review of logistic regression & decision theory
     c. Classification & regression trees
     d. Bagging trees & random forests

REVIEWS OF Machine Learning

“The Machine Learning course was fantastic. Kevin Grimm is a fabulous instructor – great pace, plenty of examples are provided, and the slides are clear and easy to follow. I would highly recommend the course to colleagues. The materials provided for exercises and practice are plenty and instructions are very clear.”
  Grettel Castro, Florida International University

“Very informative – a great class for epidemiologists who are interested in applying machine learning techniques.”
  Gretchen Bandoli, University of California, San Diego

“Dr. Grimm is an outstanding instructor. He begins with regression content that is familiar to most people and builds to the complex machine learning topics. The examples were useful and he is able to answer questions expertly.”
  Brent Small, University of South Florida

“This is an excellent introduction to supervised machine learning models. Grimm covers a considerable number of complex topics in a very efficient way. Clearly he has been teaching this material for a long time.”
  Anibal Perez-Linan, University of Notre Dame

“This course is very informative and Kevin is a really good instructor to help people with different levels of statistical and content backgrounds to get involved in the course and build on learning.”

“Kevin Grimm is an excellent instructor with many great relatable examples and funny stories. The methods were easy to follow and intuitive, and syntax was clear. Matches the high quality offered by other Statistical Horizons courses.”
  Andy Kin On Wong, University of Toronto / University Health Network

“Good intro class to machine learning concepts. Having R scripts to take home will be helpful as I try to apply it to my institutional data.”
  Andrea Borondy Kitts, Lahey Hospital & Medical Center