A 2-Day Seminar Taught by Kevin Grimm, Ph.D.
To see a sample of the course materials, click here.
Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. These methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.
While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.
The first day starts with an overview of machine learning and continues with an introduction to the basic techniques. Topics for day 1 include cross-validation, multiple regression, and basic variable selection methods, as well as an overview of the R statistical framework. The second day focuses on advanced variable selection methods for regression analysis. Topics include multivariate adaptive regression splines, lasso regression, classification and regression trees, bagging, and random forests. Throughout the course, participants gain experience with these methods through hands-on exercises.
This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.
Who should attend?
If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.
LOCAtions, Format, And Materials
The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Jamaica Bay Inn, 4175 Admiralty Way, Marina Del Rey, CA 90292.
Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking.
Registration and lodging
The fee of $995.00 includes all seminar materials.
If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50).
Lodging Reservation Instructions
Nearby hotel options include:
Jamaica Bay Inn, seminar location, 4175 Admiralty Way, Marina del Rey, CA 90292
Marina del Rey Marriott, 4100 Admiralty Way, Marina del Rey, California 90292
Hilton Garden Inn Los Angeles Marina Del Rey, 4200 Admiralty Way, Marina del Rey, California 90292
Foghorn Harbor Inn, 4140 Via Marina, Marina del Rey, California 90292
Inn at Venice Beach, 327 W Washington Blvd, Venice, California 90291
No reserved room blocks are currently available at these hotels. We recommend going directly to the hotel’s website or checking other online travel sites. Airbnb’s may also be available in this vicinity.
1. Introduction to Machine Learning
a. Introduction to machine learning
b. Introduction to R
c. Single predictor regression models & cross-validation
d. Multiple regression
e. Best subsets regression & forward selection
2. Advanced Variable Selection
a. Multivariate adaptive regression splines & lasso regression
b. Review of logistic regression & decision theory
c. Classification & regression trees
d. Bagging trees & random forests
“This workshop was extremely well organized. The instructor was very knowledgeable on the topic.”
Soyang Kwon, Northwestern University
“If you have a reasonably broad knowledge of modeling and measurement and want to extend this to include machine learning, this course definitely takes you there. Helped solidify my understanding and importance of machine learning.”
John Fava, Vanguard
“The instructor was very knowledgeable and generally very good at conveying complex concepts. His presentations were detailed and well-organized. He was generally very open to answering all questions.”
Wilson Vincent, University of California, San Francisco
“Wonderful course. Describing machine learning from a statistical perspective. Dr. Grimm is a good speaker and lecturer. He explained everything clearly. It would be perfect for anyone with some statistical and R foundation.”
Jie Wang, University of California, Los Angeles
“This was an excellent course. The instructor presents the material clearly and with several examples to demonstrate the methods. I feel able to apply these methods in my work.”
Michael Monuteaux, Boston Children’s Hospital
“Dr. Grimm is a very good instructor who clearly communicated and explained the material. His mastery of the material was evident. Good level of enthusiasm.”
Mark Boye, Eli Lilly and Company, Inc.