Advanced Machine Learning - Online Course
A 4-Day Livestream Seminar Taught by
Ross Jacobucci10:30am-12:30pm (convert to your local time)
1:30pm-3:00pm
Machine learning–including artificial intelligence, big data, supervised learning, and data science–has had an enormous impact in both academic research and industry. Development of innovative machine learning algorithms has been paired with the availability of large datasets and has facilitated the collection of even larger datasets, often times containing novel data types (e.g., text).
While machine learning has become increasingly easy to apply in many programming languages, it also presents a number of challenges; specifically, how to interpret the relationships between variables, how to prevent overfitting, and how to deal with the inevitable issues that arise from collecting diverse data types.
This seminar builds off of introductory materials on machine learning, assuming a basic familiarity with the ideas behind regularization in regression, cross-validation, and decision trees.
Starting July 18, we are offering this seminar as a 4-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions. Live captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
More details about the course content
The first two days of the course rely heavily on chapters (4-6) from Machine learning for Social and Behavioral Research by Ross Jacobucci, Kevin Grimm, & Zhiyong Zhang. The instructor will provide a variety of readings on the topics for days 3-4.
The first day covers the state-of-the-art algorithms for prediction problems with a single outcome. The second day focuses on putting everything together, namely, how to best run all of these algorithms and properly compare their results. Finally, the third and forth days shift to applying deep learning models. The third day will provide a background of commonly-used deep learning architectures, while day four focuses on applying these (and other) models to text data. The final topic will be a case study in extracting and analyzing text from smartphone screenshots.
Understanding how each algorithm works will be paired with material on how to apply the method with minimal coding in R, and for days 3 and 4, some Python scripts. The instructor will demonstrate how he incorporates ChatGPT into his coding process to accelerate understanding and application of both R and Python scripts.
The first two days of the course rely heavily on chapters (4-6) from Machine learning for Social and Behavioral Research by Ross Jacobucci, Kevin Grimm, & Zhiyong Zhang. The instructor will provide a variety of readings on the topics for days 3-4.
The first day covers the state-of-the-art algorithms for prediction problems with a single outcome. The second day focuses on putting everything together, namely, how to best run all of these algorithms and properly compare their results. Finally, the third and forth days shift to applying deep learning models. The third day will provide a background of commonly-used deep learning architectures, while day four focuses on applying these (and other) models to text data. The final topic will be a case study in extracting and analyzing text from smartphone screenshots.
Understanding how each algorithm works will be paired with material on how to apply the method with minimal coding in R, and for days 3 and 4, some Python scripts. The instructor will demonstrate how he incorporates ChatGPT into his coding process to accelerate understanding and application of both R and Python scripts.
Computing
This seminar will use R and Python for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to have a computer with R and RStudio installed. RStudio is a freely available interface for R.
Some methods will also be demonstrated using Python, which offers greater deep learning facilities than R. Download Anaconda to run Python scripts. In addition, Jupyter Notebook can be installed and run through Anaconda.
This seminar presumes at least some exposure to the R computing environment. Participants should be familiar with how to perform basic tasks in R, such as importing data, coding data, estimating simple statistical models, experience in installing and loading packages, examining the contents of objects, and the use of “for” loops.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.
This seminar will use R and Python for the empirical examples and exercises. To participate in the hands-on exercises, you are strongly encouraged to have a computer with R and RStudio installed. RStudio is a freely available interface for R.
Some methods will also be demonstrated using Python, which offers greater deep learning facilities than R. Download Anaconda to run Python scripts. In addition, Jupyter Notebook can be installed and run through Anaconda.
This seminar presumes at least some exposure to the R computing environment. Participants should be familiar with how to perform basic tasks in R, such as importing data, coding data, estimating simple statistical models, experience in installing and loading packages, examining the contents of objects, and the use of “for” loops.
Who should register?
If you have an introductory knowledge of machine learning and want to learn the more advanced concepts, this course is for you. The material in this course builds off of the topics taught in Machine Learning, requiring at least familiarity with logistic regression, decision trees, and regularized regression, along with the concepts of cross-validation and bootstrapping.
The course will briefly recap each topic followed by a more advanced look into each of those areas, while building into a number of more complex methods. The seminar will integrate the methods and results from a number of research articles that utilize machine learning from the instructor’s research on suicide.
If you have an introductory knowledge of machine learning and want to learn the more advanced concepts, this course is for you. The material in this course builds off of the topics taught in Machine Learning, requiring at least familiarity with logistic regression, decision trees, and regularized regression, along with the concepts of cross-validation and bootstrapping.
The course will briefly recap each topic followed by a more advanced look into each of those areas, while building into a number of more complex methods. The seminar will integrate the methods and results from a number of research articles that utilize machine learning from the instructor’s research on suicide.
Seminar outline
Day 1: Advanced prediction
- Gradient boosting
- Random forest
- Ensembles Broadly (SuperLearner)
Day 2: Assessing prediction
- Regression fit metrics
- Classification fit metrics
- Handling imbalanced data
- Advanced cross-validation for comparing algorithms
- Parallel and high-performance computing
Day 3: Deep learning
- Overview of neural networks
- Training a custom model
- Using pre-trained models (e.g., GPT)
Day 4: Models for text
- Interpretable models (Dictionary approaches; topic models)
- Using deep learning
- Case study: screenshots
Day 1: Advanced prediction
- Gradient boosting
- Random forest
- Ensembles Broadly (SuperLearner)
Day 2: Assessing prediction
- Regression fit metrics
- Classification fit metrics
- Handling imbalanced data
- Advanced cross-validation for comparing algorithms
- Parallel and high-performance computing
Day 3: Deep learning
- Overview of neural networks
- Training a custom model
- Using pre-trained models (e.g., GPT)
Day 4: Models for text
- Interpretable models (Dictionary approaches; topic models)
- Using deep learning
- Case study: screenshots
Payment information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.