Machine Learning

A 3-Day Remote Seminar Taught by Kevin Grimm, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

This seminar is currently sold out. Email info@statisticalhorizons.com to be added to the waitlist.


Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. This seminar offers a thorough introduction to machine learning methods. Topics covered include: cross-validation; multiple regression; basic variable selection methods; an overview of the R statistical framework; and advanced variable selection methods for regression analysis.

Starting January 7, we are offering this seminar as a 3-day synchronous*, remote workshop. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional session will be held Thursday and Friday afternoons as an “office hour”, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for two weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 


MORE DETAILS ABOUT THE COURSE CONTENT

Machine Learning methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.

While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.

The course starts with an overview of machine learning and continues with an introduction to the basic techniques. It will also focus on advanced variable selection methods for regression analysis, including multivariate adaptive regression splines, lasso regression, classification and regression trees, bagging, and random forests. Throughout the course, you will gain experience with these methods through hands-on exercises.


COMPUTING

This remote seminar is held via Zoom, a free video conferencing application. Instructions for joining a session via Zoom are available here. Before the seminar begins, you will receive an email with the meeting code and password you must use to join.  

This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to use a computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.


WHO SHOULD Register? 

If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.


SEMINAR OUTLINE

Day 1: Introduction to Machine Learning & Variable Selection

  • Introduction to machine learning
  • Introduction to R
  • Single predictor regression models & cross-validation
  • Multiple regression
  • Best subsets regression & forward selection
  • Lasso Regression

Day 2: Classification Techniques

  • Review of logistic regression
  • Decision theory
  • Naïve Bayes Classifier
  • k-nearest neighbors
  • Support vector machines

Day 3: Recursive Partitioning & Ensemble Models

  • Classification & Regression Trees
  • Weights in Classification & Regression Trees
  • Conditional Inference Trees
  • Evolutionary Trees
  • Bootstrap Aggregation (Bagging)
  • Random Forests
  • Boosting

REVIEWS OF Machine Learning

“I have to admit, I’ve generally been pretty skeptical about machine learning. I took think class partly to see if my skepticism was warranted, but also because I simply wanted to go beyond some of the methods I typically apply in data analysis. I learned that machine learning can be applied in a responsible way and can be useful when analyzing social science data. In addition to clearly explaining the different techniques and R functions, Kevin addressed many of the foundational aspects behind these types of models, which helped me to appreciate their potential strengths and better understand their limitations. The take-home assignments using different datasets also helped to build my understanding of different techniques and how to apply them.”
  Michelle Maroto, University of Alberta

“The slides and homework assignments were well organized and clearly presented/communicated. The time allowed between the lecture and homework review was appropriate and useful. Four days were a good amount of time to do the workshop online and it was easier to follow via this virtual workshop format. The professor, Kevin Grimm, made an excellent teaching.”
  Oscar Coltell, Universitat Jaume I

“The course is beyond helpful. Professor Grimm obviously has a deep understanding of the material, and he is very effective at answering participant questions. The help and resources he provides defintely go beyond what I expected. I would recommend this course to anyone interested in machine learning.”
  Alex Marbut, University of Alabama

“The (supervised) Machine Learning course was very good. Prof. Kevin Grimm is an outstanding instructor, covering complex topics of the ML in a very pedagogical way.”
  Tamar Abzhandadze, University of Gothenburg