Analysis of Complex Survey Data - Online Course
A 3-Day Livestream Seminar Taught by
Brady WestThursday, May 8 –
Saturday, May 10, 2025
10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm
Standard courses on statistical analysis assume that survey data are collected from a simple random sample of the target population. Little attention is given to design features of the survey, including unequal probabilities of observation and stratified multistage sampling.
Most software procedures commonly used for data analysis in statistical software packages, such as SAS, SPSS, and Stata, do not allow the analyst to take these properties of survey data into account. Failure to do so can have an important impact on estimation and population inference for all types of analyses, ranging from simple descriptive statistics to multivariable regression models.
This seminar provides a practical introduction to statistical methods for the analysis of complex sample survey data. Such data typically include weights that adjust for differences in probability of selection, differences in subgroup response rates, stratification, and clustering, often in multiple stages.
Starting May 8, we are offering this seminar as a 3-day synchronous*, livestream workshop held via the free video-conferencing software Zoom. Each day will consist of two lecture sessions which include hands-on exercises, separated by a 1-hour break. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.
*We understand that finding time to participate in livestream courses can be difficult. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
More details about the course content
The course will introduce variance estimation techniques that take into account the weighting, stratification, and cluster sampling that are properties of the multistage sampling designs used by most major survey organizations. Initially, we’ll focus on the estimation of sampling variances for descriptive statistics (means, proportions and quantiles of distributions), and then we’ll turn to variance estimation for subpopulations and multivariable modeling.
There will be a strong practical focus on the use of available software procedures for commonly used analyses, including testing for between-group differences in means and proportions, linear regression analysis for continuous dependent variables, contingency table analysis for categorical data, logistic regression for categorical responses, and multilevel modeling. Numerous examples of these types of analyses will be presented “live” using statistical software. The course will not focus on mathematical derivation, and will only occasionally introduce known formulas for estimators or variance estimators to enhance your understanding of what the software is doing in the background.
The course will introduce variance estimation techniques that take into account the weighting, stratification, and cluster sampling that are properties of the multistage sampling designs used by most major survey organizations. Initially, we’ll focus on the estimation of sampling variances for descriptive statistics (means, proportions and quantiles of distributions), and then we’ll turn to variance estimation for subpopulations and multivariable modeling.
There will be a strong practical focus on the use of available software procedures for commonly used analyses, including testing for between-group differences in means and proportions, linear regression analysis for continuous dependent variables, contingency table analysis for categorical data, logistic regression for categorical responses, and multilevel modeling. Numerous examples of these types of analyses will be presented “live” using statistical software. The course will not focus on mathematical derivation, and will only occasionally introduce known formulas for estimators or variance estimators to enhance your understanding of what the software is doing in the background.
Computing
In all cases, methods will be illustrated using software, with Stata and R examples and syntax. Some familiarity with reading in data and performing basic statistical analyses in either Stata or R is recommended. Analogous syntax will be provided online for a variety of other packages, including SAS, SPSS, SUDAAN, and Mplus.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.
If you’d like to use R for this course but don’t yet have much experience with that package, here are some excellent on-line resources for building your R skills.
In all cases, methods will be illustrated using software, with Stata and R examples and syntax. Some familiarity with reading in data and performing basic statistical analyses in either Stata or R is recommended. Analogous syntax will be provided online for a variety of other packages, including SAS, SPSS, SUDAAN, and Mplus.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.
If you’d like to use R for this course but don’t yet have much experience with that package, here are some excellent on-line resources for building your R skills.
Who should register?
You should have a sound working knowledge of applied statistical analysis, including hypothesis testing, descriptive estimation, confidence interval construction and interpretation, and linear and generalized linear regression modeling. Background in survey sampling techniques is recommended but not required.
Importantly, this is a course on analysis of survey data and not on complex sample design, although brief introductions to common sample design techniques will be provided to motivate the material on analysis. No knowledge of specific software is required, and examples of syntax in a variety of software packages with procedures available for the analysis of complex sample survey data will be provided. The course will have a heavy practical focus and will not include extensive discussions of the mathematical theory behind these analytic approaches.
You should have a sound working knowledge of applied statistical analysis, including hypothesis testing, descriptive estimation, confidence interval construction and interpretation, and linear and generalized linear regression modeling. Background in survey sampling techniques is recommended but not required.
Importantly, this is a course on analysis of survey data and not on complex sample design, although brief introductions to common sample design techniques will be provided to motivate the material on analysis. No knowledge of specific software is required, and examples of syntax in a variety of software packages with procedures available for the analysis of complex sample survey data will be provided. The course will have a heavy practical focus and will not include extensive discussions of the mathematical theory behind these analytic approaches.
Seminar outline
DAY 1
- An introduction to complex sample designs
-
- Multi-stage designs
- Stratification
- Cluster sampling
- Finite population corrections
- Design effects
- Effective sample size
- Survey weighting and population inference
-
- Components of survey weights
- Adjustment of survey weights
- Using survey weights for estimation
- Loss of precision due to weighting
- Models and assumptions for inference from complex sample survey data
- Sampling distributions and confidence intervals
- Introduction to variance estimation
-
- Sampling error calculation models
- Ultimate clusters
- Taylor series linearization
- Introduction to specialized software for complex sample variance estimation
DAY 2
- Replication methods for variance estimation
-
- Jackknife repeated replication
- Balanced repeated replication
- Bootstrapping
- Choosing between variance estimation methods
- Descriptive analysis
-
- Means
- Totals
- Proportions
- Percentiles
- Subpopulation analysis
- Examples using available software
- Categorical data analysis
-
- Univariate analyses
- Bivariate analyses and design-adjusted chi-square tests
- Odds ratios and relative risks
- Examples using available software
DAY 3
- Linear regression analysis
-
- Review of linear regression for simple random samples
- Fitting linear regression models to complex samples
- To weight or not to weight?
- Examples using available software
- Logistic regression analysis
-
- Review of logistic regression for simple random samples
- Fitting logistic regression models to complex samples
- Examples using available software
- Multilevel modeling (time permitting)
-
- Special considerations for fitting multilevel models to complex samples
- Examples using available software
DAY 1
- An introduction to complex sample designs
-
- Multi-stage designs
- Stratification
- Cluster sampling
- Finite population corrections
- Design effects
- Effective sample size
- Survey weighting and population inference
-
- Components of survey weights
- Adjustment of survey weights
- Using survey weights for estimation
- Loss of precision due to weighting
- Models and assumptions for inference from complex sample survey data
- Sampling distributions and confidence intervals
- Introduction to variance estimation
-
- Sampling error calculation models
- Ultimate clusters
- Taylor series linearization
- Introduction to specialized software for complex sample variance estimation
DAY 2
- Replication methods for variance estimation
-
- Jackknife repeated replication
- Balanced repeated replication
- Bootstrapping
- Choosing between variance estimation methods
- Descriptive analysis
-
- Means
- Totals
- Proportions
- Percentiles
- Subpopulation analysis
- Examples using available software
- Categorical data analysis
-
- Univariate analyses
- Bivariate analyses and design-adjusted chi-square tests
- Odds ratios and relative risks
- Examples using available software
DAY 3
- Linear regression analysis
-
- Review of linear regression for simple random samples
- Fitting linear regression models to complex samples
- To weight or not to weight?
- Examples using available software
- Logistic regression analysis
-
- Review of logistic regression for simple random samples
- Fitting logistic regression models to complex samples
- Examples using available software
- Multilevel modeling (time permitting)
-
- Special considerations for fitting multilevel models to complex samples
- Examples using available software
Payment information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.