Machine Learning

A 4-Day Remote Seminar Taught by Kevin Grimm, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

This seminar is currently sold out. Email info@statisticalhorizons.com to be added to the waitlist.


Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. This seminar offers a thorough introduction to machine learning methods. Topics covered include: cross-validation; multiple regression; basic variable selection methods; an overview of the R statistical framework; and advanced variable selection methods for regression analysis.

Starting May 24, we are offering this seminar as a 4-day synchronous*, remote workshop. Each day will consist of a 3-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional lab session will be held Monday and Wednesday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for two weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.


MORE DETAILS ABOUT THE COURSE CONTENT

Machine Learning methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.

While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.

The first day introduces machine learning and cross-validation, the approach for model selection in machine learning. The second day focuses on variable selection algorithms for multiple regression including lasso regression and multivariate adaptive regression splines. The third day focuses on machine learning techniques for categorical outcomes. Topics include logistic regression, decision theory, Naïve Bayes, k-nearest neighbor, and support vector machines. The fourth and final day focuses on recursive partitioning (classification and regression trees) and ensemble models, such as bagging, random forests, and boosting. Throughout the course, you will gain experience with these methods through hands-on exercises.


COMPUTING

This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to use a computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.


WHO SHOULD Register? 

If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.


SEMINAR OUTLINE

Day 1: Introduction to Machine Learning & Cross-Validation

  • Introduction to machine learning
  • Introduction to R
  • Single predictor regression models & cross-validation

Day 2: Advanced Variable Selection

  • Multiple regression
  • Best subsets regression
  • Forward selection
  • Lasso regression
  • Multivariate adaptive regression splines

Day 3: Classification Techniques

  • Review of logistic regression
  • Decision theory
  • Naïve Bayes classifier
  • k-nearest neighbors
  • Support vector machines

Day 4: Recursive Partitioning & Ensemble Models

  • Classification & regression trees
  • Weights in classification & regression trees
  • Conditional inference trees
  • Bootstrap aggregation (Bagging)
  • Random forests
  • Boosting

REVIEWS OF Machine Learning

“I really enjoyed the course. Great teacher. Covers a lot of different examples and methods. Can be used and adapted to my own work very quickly. 5 star recommendation.”
  Stefan Gross, University Medicine Greifswald

“The course addresses the most important supervised machine learning algorithms and approaches in a clear and comprehensive way. Professor Kevin Grimm provides excellent and clear explanations, as well as a plethora of useful material. His expertise in latent variable modeling also allows for fruitful cross-disciplinary exchanges. I highly recommend this course to anyone interested in deepening his/her expertise in this area.”
  Enrico Perinelli, University of Trento

“I learned that machine learning can be applied in a responsible way and can be useful when analyzing social science data. In addition to clearly explaining the different techniques and R functions, Kevin addressed many of the foundational aspects behind these types of models, which helped me to appreciate their potential strengths and better understand their limitations. The take-home assignments using different datasets also helped to build my understanding of different techniques and how to apply them.”
  Michelle Maroto, University of Alberta

“The slides and homework assignments were well organized and clearly presented/communicated. The time allowed between the lecture and homework review was appropriate and useful. Four days were a good amount of time to do the workshop online and it was easier to follow via this virtual workshop format. The professor, Kevin Grimm, made an excellent teacher.”
  Oscar Coltell, Universitat Jaume I