Machine Learning

A 2-Day Seminar Taught by Kevin Grimm, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. These methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.

While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.

The first day starts with an overview of machine learning and continues with an introduction to the basic techniques. Topics for day 1 include cross-validation, multiple regression, and basic variable selection methods, as well as an overview of the R statistical framework. The second day focuses on advanced variable selection methods for regression analysis. Topics include multivariate adaptive regression splines, lasso regression, classification and regression trees, bagging, and random forests. Throughout the course, participants gain experience with these methods through hands-on exercises.


This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms

Who should attend? 

If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language. There are a number of excellent introductory books to R as well as a collection of online tutorials for people who are unfamiliar with R (e.g.,

LOCAtions, Format, And Materials 

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Holiday Inn Chicago Mart Plaza River North, 350 West Mart Center Drive, Chicago, Illinois 60654.

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995.00 includes all seminar materials.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions

A block of guest rooms has been reserved at the Holiday Inn Chicago Mart Plaza River North, 350 West Mart Center Drive, Chicago, Illinois 60654, where the seminar takes place, at a special rate of $199. In order to make reservations, call 855-268-0372 during business hours and identify yourself as part of the Statistical Horizons LLC group. For guaranteed rate and availability, you must reserve your room no later than Tuesday, October 9, 2018.

We also recommend going directly to the hotel’s website or checking other online hotel sites. Pricing varies and you may be able to secure a better rate. 


1. Introduction to Machine Learning
     a. Introduction to machine learning
     b. Introduction to R
     c. Single predictor regression models & cross-validation
     d. Multiple regression
     e. Best subsets regression & forward selection

2. Advanced Variable Selection
     a. Multivariate adaptive regression splines & lasso regression
     b. Review of logistic regression & decision theory
     c. Classification & regression trees
     d. Bagging trees & random forests


“I got a lot out of this course! Concepts were explained in ways that are easy to understand. With a conceptual understanding of machine learning approaches and R code, I feel that I could actually apply the methods to my projects today while understanding their limitations. Very worthwhile!”
  Megan Shepherd-Banigan, Duke University

“The course is really helpful. The instructor illustrated the contents well and presented clearly to students. You can tell that the instructor is really familiar with the field and he is really hands-on with the data.”
  Wei Guo, National Institute of Mental Health

“Theoretical ideas were only provided when necessary. Emphasis was on application. You walk away from the seminar with tangible information and a new skillset.”
  Taylor McLinden, BC Centre for Excellence in HIV/AIDS

“The course doesn’t just dive right into the more complex modeling techniques that are the more well-known components of machine learning (e.g. random forest), but lays the foundation for the techniques by reviewing the basic regression concepts that are important for understanding the method.”
  Melanie Schwandt, National Institutes of Health

“I’ve been hearing about machine learning for several years and wanted to get a good dose of the content. This was a perfect introduction – just enough material and examples to help me get started and plenty of new avenues to explore. Kevin was outstanding as an instructor.”
  Michael Broda, Virginia Commonwealth University

“Instructor was great, well-prepared, and knew the subject. The materials will help solve problems in my own research.”
  Igor Paploski, University of Minnesota

“One of the most interesting things I learned from this course is about data exploration. The programming codes together with the theoretical statistics is excellent.”
  Nazib M. Seidu, The University of Gothenburg