A 2-Day Seminar Taught by Kevin Grimm, Ph.D.
To see a sample of the course materials, click here.
Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. These methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.
While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.
The first day starts with an overview of machine learning and continues with an introduction to the basic techniques. Topics for day 1 include cross-validation, multiple regression, and basic variable selection methods, as well as an overview of the R statistical framework. The second day focuses on advanced variable selection methods for regression analysis. Topics include multivariate adaptive regression splines, lasso regression, classification and regression trees, bagging, and random forests. Throughout the course, participants gain experience with these methods through hands-on exercises.
This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who should attend?
If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.
Location, format, materials.
The class will meet from 9 am to 5 pm each day (with a 1-hour lunch break) at the Holiday Inn Fort Myers Airport at Town Center, 9931 Interstate Commerce Drive, Fort Myers, FL 33913. This hotel is 5 miles from the Fort Myers International Airport, and there is a complimentary hotel shuttle to and from the airport. The shuttle can also take you to and from the nearby Gulf Coast Town Center, a large open-air shopping center with numerous stores, restaurants, and a movie theater.
Although you can expect the weather to be comfortably warm (75 is the average high in early February), this is not a resort-type location. However, it’s about a half-hour drive to several attractive vacation areas, including Naples, Sanibel Island, and Fort Myers Beach.
The Fort Myers International Airport (RSW) is served by numerous airlines with direct flights to and from most major cities in the U.S. However, demand for seats in February is quite high, so be sure to make reservations at your earliest opportunity.
Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking.
Registration and Lodging
The fee of $995 includes all course materials.
If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50 USD).
Lodging Reservation Instructions
The Holiday Inn Fort Myers Airport at Town Center, where the seminar takes place, is currently sold out for the night of January 31. Other nearby hotel options include:
- Homewood Suites by Hilton Fort Myers Airport/FGCU, 16450 Corporate Commerce Way, Fort Myers, FL 33913 (0.5 miles from seminar)
- Drury Inn and Suites Fort Myers Airport FGCU, 9950 University Plaza Drive, Fort Myers, FL 33913 (1.2 miles from seminar)
- Courtyard by Marriott Fort Myers at I-75 and Gulf Coast Town Center, 10050 Gulf Center Dr, Fort Myers, FL 33913 (1.5 miles from seminar)
No reserved room blocks are currently available at these hotels. We recommend going directly to the hotel’s website or checking other online travel sites. Airbnb’s may also be available in this vicinity.
1. Introduction to Machine Learning
a. Introduction to machine learning
b. Introduction to R
c. Single predictor regression models & cross-validation
d. Multiple regression
e. Best subsets regression & forward selection
2. Advanced Variable Selection
a. Multivariate adaptive regression splines & lasso regression
b. Review of logistic regression & decision theory
c. Classification & regression trees
d. Bagging trees & random forests
“Kevin Grimm is an excellent instructor and I learned so much over these 2 days. Great content, materials are practical and useful, and there were lots of examples including R code. This was a very enjoyable learning experience. It was my first Statistical Horizons course and I’m sure I’ll be back for more.”
Ann A. O’Connell, The Ohio State University
“This course provides a clear, broad introduction to many of the most popular machine learning methods, with ample examples to practice on your own. Kevin is an engaging speaker who relates much of his research to the course topics, making them easier to digest.”
Andy Lin, University of California, Los Angeles
“This is the third class I have taken from Statistical Horizons. Just like the other three, this class (Machine Learning) was very informative; perfectly right in the middle – not too complicated but not too simplistic. I highly recommend!”
Rachel Lovell, Case Western Reserve University
“This course was presented very clearly, effectively, and with an appropriate amount of rigor for an active statistical researcher.”
Morgan DeBusk-Lane, Chesterfield County Public Schools
“Dr. Grimm was a wonderful, helpful, and articulate instructor who clearly knows machine learning so well and is eager to help others also understand this broad statistical approach. He starts with the basics and quickly works his way up to more advanced topics – making the course suitable to a broad audience. I would highly recommend this class.”
Linzy Bohn, University of Alberta
“Great step-by-step walkthrough of many common procedures.”
Christopher Greenwood, Deakin University
“This class is truly eye opening for people who really have little to no background of machine learning. Even for statisticians, you still get the ‘oh, I didn’t think of that!’ which is super helpful to stay informed as a professional.”
Choo Phei Wee, Children’s Hospital Los Angeles
“This course would benefit those new to machine learning as well as those looking to extend their knowledge of these approaches. The course progresses from basic to advanced topics in a well-structured and accessible fashion, with clear illustrations of how the methods can be applied in real-world contexts.”
Nicholas Parr, University of Oregon
“This course is perfect for someone who is not familiar with machine learning but has basic knowledge of regression models.”
Siavash Jalal, University of California, Los Angeles
“If you already have a solid background in regression modeling, this course provides a practical approach to machine learning. A number of very useful methods are covered, from univariate regression to random forests. Most importantly, Kevin discusses the limitations to each. Great teacher presenting a valuable topic!”
Mary Anne Doyle, Irdeto