Machine Learning - Online Course
A 4-Day Livestream Seminar Taught by
Kevin Grimm10:30am-12:30pm (convert to your local time)
1:30pm-3:00pm
NOTE: This is an introductory seminar on machine learning. If you already have substantial knowledge of machine learning and experience with implementing it, you may want to check out our seminar on Advanced Machine Learning.
Machine learning has emerged as a major field of statistics and data analysis where the goal is to create reliable and flexible predictive models. This seminar offers a thorough introduction to machine learning methods. Topics covered include: cross-validation; multiple regression; basic variable selection methods; an overview of the R statistical framework; and advanced variable selection methods for regression analysis.
Starting May 16, we are offering this seminar as a 4-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions. Live captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
More details about the course content
Machine learning methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.
While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.
The seminar begins by introducing machine learning and cross-validation, the approach for model selection in machine learning. Next, we will focus on variable selection algorithms for multiple regression, including lasso regression and multivariate adaptive regression splines. We will also cover machine learning techniques for categorical outcomes. Topics include logistic regression, decision theory, naïve Bayes, k-nearest neighbor, and support vector machines. Finally, we will focus on recursive partitioning (classification and regression trees) and ensemble models, such as bagging, random forests, and boosting. Throughout the course, you will gain experience with these methods through hands-on exercises.
Machine learning methods have gained much attention for analyzing large datasets that may be composed of several hundred variables and many thousands (perhaps millions) of participants. In these situations, machine learning algorithms attempt to identify key variables needed in the predictive model, and several techniques search for nonlinear associations and interactive effects.
While machine learning techniques have been most attractive for large datasets, these same techniques can be useful in smaller datasets for the same reasons–to create simpler and more reliable predictive models, and to search for nonlinear and interactive effects. These techniques are also a natural follow-up to standard hypothesis-driven statistical analyses (e.g., multiple regression) to search for additional important patterns in the data.
The seminar begins by introducing machine learning and cross-validation, the approach for model selection in machine learning. Next, we will focus on variable selection algorithms for multiple regression, including lasso regression and multivariate adaptive regression splines. We will also cover machine learning techniques for categorical outcomes. Topics include logistic regression, decision theory, naïve Bayes, k-nearest neighbor, and support vector machines. Finally, we will focus on recursive partitioning (classification and regression trees) and ensemble models, such as bagging, random forests, and boosting. Throughout the course, you will gain experience with these methods through hands-on exercises.
Computing
This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to use a computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
This seminar will use R for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to use a computer with the most recent version of R and RStudio installed. RStudio is a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who should register?
If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.
If you have a desire to learn how to effectively explore your data and have a strong statistical background in regression, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression. It is also helpful to have familiarity with the R programming language.
Seminar outline
Day 1: Introduction to Machine Learning & Cross-Validation
-
- Introduction to machine learning
- Introduction to R
- Single predictor regression models & cross-validation
Day 2: Advanced Variable Selection
-
- Multiple regression
- Best subsets regression
- Forward selection
- Lasso regression
- Multivariate adaptive regression splines
Day 3: Classification Techniques
-
- Review of logistic regression
- Decision theory
- Naïve Bayes classifier
- k-nearest neighbors
- Support vector machines
Day 4: Recursive Partitioning & Ensemble Models
-
- Classification & regression trees
- Weights in classification & regression trees
- Conditional inference trees
- Bootstrap aggregation (Bagging)
- Random forests
- Boosting
Day 1: Introduction to Machine Learning & Cross-Validation
-
- Introduction to machine learning
- Introduction to R
- Single predictor regression models & cross-validation
Day 2: Advanced Variable Selection
-
- Multiple regression
- Best subsets regression
- Forward selection
- Lasso regression
- Multivariate adaptive regression splines
Day 3: Classification Techniques
-
- Review of logistic regression
- Decision theory
- Naïve Bayes classifier
- k-nearest neighbors
- Support vector machines
Day 4: Recursive Partitioning & Ensemble Models
-
- Classification & regression trees
- Weights in classification & regression trees
- Conditional inference trees
- Bootstrap aggregation (Bagging)
- Random forests
- Boosting
Payment information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.