Machine Learning - Online Course
A 3-Day Livestream Seminar Taught by
Bruce Desmarais10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm
The rapidly growing relevance of Machine Learning cuts across scientific disciplines in the humanities, social sciences, and natural sciences. It is increasingly used in research for predictive, explanatory, and exploratory purposes. Since 2022, Google Scholar has found approximately 557,000 scientific publications that included the phrase “machine learning.”
This course provides a comprehensive introduction to machine learning. Topics include: cross-validation, model evaluation, variable selection, classification, prediction, and regression.
Starting January 8, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
More details about the course content
Scientific research is increasingly conducted using data sets that are larger and more complex than the data for which conventional statistical tools were designed. Examples of such data include population-scale information on individual-level consumer and political behavior, data streams collected from social media and other digital sources, and data streamed from physical and environmental sensors.
There are three fundamental ways in which fine-grained, voluminous, and high-dimensional data require a set of methods that are more flexible than the conventional statistical toolkit. First, the data are inherently more complex, making it difficult to specify an adequate statistical model from theory alone. Second, the data are high dimensional, meaning there are more variables than one can include in conventional statistical models. Third, the data contain adequate information to make highly accurate, and importantly, actionable, predictions about unseen data (e.g., forecasts). These three features require an analytical toolkit that is capable of learning model structure, selecting variables, and producing accurate predictions, which are all capabilities of foundational machine learning methods. In this seminar, we will cover foundational machine learning, with a focus on essential concepts and practical application.
The seminar begins by introducing the foundational concepts of predictive modeling and model evaluation with held-out data and cross-validation. Next, we will focus on algorithms for model building and variable selection within the machine learning framework. Finally, we will cover core classes of machine learning models, sometimes referred to as “learners”. These include k-nearest neighbor methods, support vector machines, regression trees, random forests, and XGBoost. Throughout the seminar, you will gain experience through hands-on exercises.
In Good to Go? When to Stop Developing a Machine Learning Pipeline and Start Applying It, Professor Desmarais unpacks how to decide when your machine learning pipeline is ready to transition from development to real-world application, covering strategies like benchmarking, exploring different approaches, and utilizing pre-trained models.
Professor Desmarais explores how the perceived conflict between accurate predictions and interpretability is misleading in his latest blog post, In Machine Learning, Can Good Predictive Models also be Interpretable?
In The Machine Learning Foundations of Artificial Intelligence, Desmarais discusses the multifaceted and rapidly evolving intersection of machine learning and artificial intelligence.
Read about Professor Desmarais’s first foray into machine learning methods as a graduate student in his blog post, Milk, Eggs, and Courts: My First Machine Learning Project.
Scientific research is increasingly conducted using data sets that are larger and more complex than the data for which conventional statistical tools were designed. Examples of such data include population-scale information on individual-level consumer and political behavior, data streams collected from social media and other digital sources, and data streamed from physical and environmental sensors.
There are three fundamental ways in which fine-grained, voluminous, and high-dimensional data require a set of methods that are more flexible than the conventional statistical toolkit. First, the data are inherently more complex, making it difficult to specify an adequate statistical model from theory alone. Second, the data are high dimensional, meaning there are more variables than one can include in conventional statistical models. Third, the data contain adequate information to make highly accurate, and importantly, actionable, predictions about unseen data (e.g., forecasts). These three features require an analytical toolkit that is capable of learning model structure, selecting variables, and producing accurate predictions, which are all capabilities of foundational machine learning methods. In this seminar, we will cover foundational machine learning, with a focus on essential concepts and practical application.
The seminar begins by introducing the foundational concepts of predictive modeling and model evaluation with held-out data and cross-validation. Next, we will focus on algorithms for model building and variable selection within the machine learning framework. Finally, we will cover core classes of machine learning models, sometimes referred to as “learners”. These include k-nearest neighbor methods, support vector machines, regression trees, random forests, and XGBoost. Throughout the seminar, you will gain experience through hands-on exercises.
In Good to Go? When to Stop Developing a Machine Learning Pipeline and Start Applying It, Professor Desmarais unpacks how to decide when your machine learning pipeline is ready to transition from development to real-world application, covering strategies like benchmarking, exploring different approaches, and utilizing pre-trained models.
Professor Desmarais explores how the perceived conflict between accurate predictions and interpretability is misleading in his latest blog post, In Machine Learning, Can Good Predictive Models also be Interpretable?
In The Machine Learning Foundations of Artificial Intelligence, Desmarais discusses the multifaceted and rapidly evolving intersection of machine learning and artificial intelligence.
Read about Professor Desmarais’s first foray into machine learning methods as a graduate student in his blog post, Milk, Eggs, and Courts: My First Machine Learning Project.
Computing
This seminar will use R for all the computing tasks. To participate in the hands-on exercises, you are encouraged to use a computer with the most recent version of R and RStudio installed. RStudio is an integrated development environment (IDE) for R that makes a powerful companion to the R programming language. Both R and RStudio are free and available for all major operating systems.
To follow the presentation and do the exercises, you should feel comfortable performing basic tasks in R, such as importing and coding data, making simple plots, and estimating regression models.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
This seminar will use R for all the computing tasks. To participate in the hands-on exercises, you are encouraged to use a computer with the most recent version of R and RStudio installed. RStudio is an integrated development environment (IDE) for R that makes a powerful companion to the R programming language. Both R and RStudio are free and available for all major operating systems.
To follow the presentation and do the exercises, you should feel comfortable performing basic tasks in R, such as importing and coding data, making simple plots, and estimating regression models.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who should register?
If you are interested in taking a data-driven approach to model building, learning how to optimize models for predictive performance, learning best practices for capturing complex nonlinear relationships in data, and/or building a foundation for training in the growing field of deep learning, this seminar is for you. To get the most out of this seminar, you should have a background in regression.
If you are interested in taking a data-driven approach to model building, learning how to optimize models for predictive performance, learning best practices for capturing complex nonlinear relationships in data, and/or building a foundation for training in the growing field of deep learning, this seminar is for you. To get the most out of this seminar, you should have a background in regression.
Seminar outline
Introduction to machine learning and predictive modeling
- Introduction to R for machine learning
- Predictive vs. explanatory modeling
- Evaluating predictive performance with held-out data and cross-validation
Regression recap, variable selection, and regularization
- Linear and logistic regression recap
- Feature/variable importance
- Best subsets regression
- Regularization
Machine learning methods for prediction and exploration
- k-nearest neighbors
- Support vector machines
- Classification/regression trees and random forests, XGboost
- Neural networks
- Clustering and principal component analysis
Introduction to machine learning and predictive modeling
- Introduction to R for machine learning
- Predictive vs. explanatory modeling
- Evaluating predictive performance with held-out data and cross-validation
Regression recap, variable selection, and regularization
- Linear and logistic regression recap
- Feature/variable importance
- Best subsets regression
- Regularization
Machine learning methods for prediction and exploration
- k-nearest neighbors
- Support vector machines
- Classification/regression trees and random forests, XGboost
- Neural networks
- Clustering and principal component analysis
Payment information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.