Introduction to Statistical Genetics - Online Course
A 3-Day Livestream Seminar Taught by
Daniel E. Adkins10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm
Genomic data has transformed biomedical research and is now becoming a ubiquitous feature of large social and behavioral datasets. Integrating genomic data into the analysis of health, behavioral, and social outcomes makes it possible to model a range of processes, including polygenic risk and gene-environment interplay, as well as control for genetic factors to more accurately estimate environmental effects. However, the start-up costs of learning the specialized tools and techniques of statistical genetics can be a deterrent.
The goal of this seminar is to impart a thorough conceptual understanding of statistical genetics, and to equip you with the technical skills to conduct a range of popular statistical genetic analyses. Topics covered include: managing high dimensional genomic data; linear and logistic genome-wide association studies (GWAS); population stratification; multiple testing; mixed model GWAS methods for nonindependent samples; basic and advanced polygenic scoring; genome-environment interaction models; and an overview of the key software tools.
Starting December 12, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
More details about the course content
Statistical genetics provides the opportunity to map the genetic architecture of health outcomes, and to develop more accurate, comprehensive models of biosocial and behavioral processes. However, due to the unique aspects of genomic data, including its enormous size and complex correlational structure, getting started can be a challenge. To address this, we teach a full suite of methods for managing, cleaning, association testing, and modeling big genomic data. After completing the course, you will be able to manage and analyze your own GWAS data, including running a range of GWAS models and conducting diagnostics to minimize bias in results. Additionally, you will learn to calculate and apply polygenic scores and genome-environment interaction models.
This seminar will introduce the computing environment, R and RStudio, before discussing multiple regression, and handling and cleaning big genomic data matrices. It will offer a comprehensive survey of GWAS methods, beginning with basic linear and logistic GWAS, modeling population stratification, and common covariate specifications. We then discuss post-GWAS processing, including addressing multiple testing and plotting results for diagnostic and visualization purposes. Subsequently, we will discuss GWAS models for nonindependent observations, such as repeated assessments and family data. We then explore polygenic scoring, which integrates prior information from large sample GWAS summary statistics to allow researchers to generate genetic scores that capture genome-wide genetic propensity to specific traits, as well as modeling genome-environment interactions. Throughout the course, you will gain experience with these methods through hands-on exercises.
Statistical genetics provides the opportunity to map the genetic architecture of health outcomes, and to develop more accurate, comprehensive models of biosocial and behavioral processes. However, due to the unique aspects of genomic data, including its enormous size and complex correlational structure, getting started can be a challenge. To address this, we teach a full suite of methods for managing, cleaning, association testing, and modeling big genomic data. After completing the course, you will be able to manage and analyze your own GWAS data, including running a range of GWAS models and conducting diagnostics to minimize bias in results. Additionally, you will learn to calculate and apply polygenic scores and genome-environment interaction models.
This seminar will introduce the computing environment, R and RStudio, before discussing multiple regression, and handling and cleaning big genomic data matrices. It will offer a comprehensive survey of GWAS methods, beginning with basic linear and logistic GWAS, modeling population stratification, and common covariate specifications. We then discuss post-GWAS processing, including addressing multiple testing and plotting results for diagnostic and visualization purposes. Subsequently, we will discuss GWAS models for nonindependent observations, such as repeated assessments and family data. We then explore polygenic scoring, which integrates prior information from large sample GWAS summary statistics to allow researchers to generate genetic scores that capture genome-wide genetic propensity to specific traits, as well as modeling genome-environment interactions. Throughout the course, you will gain experience with these methods through hands-on exercises.
Computing
This seminar will use R as the base software and incorporate genomics software, such as Plink 2.0 and Regenie. All software and the datasets used for exercises will be distributed as an easy to install virtual machine. This will spare you the effort of manually installing the various software used in the course.
Basic familiarity with R is highly desirable, but even novice R coders should be able to follow the presentation and do the exercises.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
This seminar will use R as the base software and incorporate genomics software, such as Plink 2.0 and Regenie. All software and the datasets used for exercises will be distributed as an easy to install virtual machine. This will spare you the effort of manually installing the various software used in the course.
Basic familiarity with R is highly desirable, but even novice R coders should be able to follow the presentation and do the exercises.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who should register?
If you want to learn the fundamental principles of statistical genetics, to apply them to enrich your biomedical, behavioral, or social research, this course is for you. It will impart the skills to incorporate analyses of large-scale biomedical and behavioral databases containing in-depth genomic information, such UK Biobank and Add Health, into your research. You should have a knowledge of linear or logistic regression.
If you want to learn the fundamental principles of statistical genetics, to apply them to enrich your biomedical, behavioral, or social research, this course is for you. It will impart the skills to incorporate analyses of large-scale biomedical and behavioral databases containing in-depth genomic information, such UK Biobank and Add Health, into your research. You should have a knowledge of linear or logistic regression.
Seminar outline
Day 1: Introduction to multivariate statistics, human genetics, and high dimensional genomic data
- Intro to computing environment, R and RStudio
- Multivariate statistics primer
- Genetics primer (chromosomes, genes, variants, SNPs, LD)
- Handling high dimensional genomic data matrices
- GWAS data quality control
Day 2: GWAS fundamentals
- Basic linear and logistic GWAS
- Population stratification (ancestry and PCA), Covariates in GWAS
- Multiple testing
- Plotting (e.g., QQ-plots, Manhattan plots)
Day 3: Advanced methods: GWAS of nonindependent samples, Polygenic scoring, Genome-environment interaction models
- Mixed model GWAS for repeated observations or related subjects
- Basic polygenic scoring, clumping LD
- Advanced polygenic scoring, Bayesian approaches
- Genome-environment interaction models
Day 1: Introduction to multivariate statistics, human genetics, and high dimensional genomic data
- Intro to computing environment, R and RStudio
- Multivariate statistics primer
- Genetics primer (chromosomes, genes, variants, SNPs, LD)
- Handling high dimensional genomic data matrices
- GWAS data quality control
Day 2: GWAS fundamentals
- Basic linear and logistic GWAS
- Population stratification (ancestry and PCA), Covariates in GWAS
- Multiple testing
- Plotting (e.g., QQ-plots, Manhattan plots)
Day 3: Advanced methods: GWAS of nonindependent samples, Polygenic scoring, Genome-environment interaction models
- Mixed model GWAS for repeated observations or related subjects
- Basic polygenic scoring, clumping LD
- Advanced polygenic scoring, Bayesian approaches
- Genome-environment interaction models
Payment information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.