Analysis of Complex Survey Data

A 3-Day Remote Seminar Taught by
Brady West, Ph.D.

Standard courses on statistical analysis assume that survey data are collected from a simple random sample of the target population. Little attention is given to design features of the survey, including unequal probabilities of observation and stratified multistage sampling.

Most software procedures commonly used for data analysis in statistical software packages, such as SAS, SPSS, and Stata, do not allow the analyst to take these properties of survey data into account. Failure to do so can have an important impact on estimation and population inference for all types of analyses, ranging from simple descriptive statistics to multivariable regression models.

This seminar provides an introduction to statistical methods for the analysis of complex sample survey data. Such data typically include weights that adjust for differences in probability of selection, differences in subgroup response rates, stratification, and clustering, often in multiple stages.

Starting May 6, we are offering this seminar as a 3-day synchronous*, remote workshop for the first time. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional session will be held Thursday and Friday afternoons as an “office hour”, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for two weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.


The course will introduce variance estimation techniques that take into account the weighting, stratification, and cluster sampling that are properties of the multistage sampling designs used by most major survey organizations. Initially, we’ll focus on the estimation of sampling variances for descriptive statistics (means, proportions and quantiles of distributions), and then we’ll turn to variance estimation for subpopulations and multivariable modeling.

There will be a strong practical focus on available software procedures for commonly used analyses, including testing for between-group differences in means and proportions, linear regression analysis for continuous dependent variables, contingency table analysis for categorical data, logistic regression for categorical responses, and multilevel modeling. Numerous examples of these types of analyses will be presented “live” using statistical software.


This remote seminar is held via Zoom, a free video conferencing application. Instructions for joining a session via Zoom are available here. Before the seminar begins, you will receive an email with the meeting code and password you must use to join.

In all cases, methods will be illustrated using software, with Stata and R examples and syntax. Some familiarity with reading in data and performing basic statistical analyses in either Stata or R is recommended. Analogous syntax will be provided online for a variety of other packages, including SAS, SPSS, SUDAAN, and Mplus.

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.

WHO SHOULD Register? 

Participants should have a sound working knowledge of applied statistical analysis, including hypothesis testing, descriptive estimation, confidence interval construction and interpretation, and linear and generalized linear regression modeling. Background in survey sampling techniques is recommended but not required.

Importantly, this is a course on analysis of survey data and not on complex sample design, although brief introductions to common sample design techniques will be provided to motivate the material on analysis. No knowledge of specific software is required, and examples of syntax in a variety of software packages with procedures available for the analysis of complex sample survey data will be provided. The course will have a heavy practical focus and will not include extensive discussions of the mathematical theory behind these analytic approaches.



1. An Introduction to Complex Sample Designs
     – Multi-stage designs
     – Stratification
     – Cluster sampling
     – Finite population corrections
     – Design effects
     – Effective sample size

2. Survey Weighting and Population Inference
     – Components of survey weights
     – Adjustment of survey weights
     – Using survey weights for estimation
     – Loss of precision due to weighting
     – Models and assumptions for inference from complex sample survey data
     – Sampling distributions and confidence intervals

3. Introduction to Variance Estimation
     – Sampling error calculation models
     – Ultimate clusters
     – Taylor series linearization
     – Introduction to specialized software for complex sample variance


4. Replication Methods for Variance Estimation
     – Jackknife repeated replication
     – Balanced repeated replication
     – Bootstrapping
     – Choosing between variance estimation methods

5. Descriptive Analysis
     – Means
     – Totals
     – Proportions
     – Percentiles
     – Subpopulation analysis
     – Examples using available software

6. Categorical Data Analysis
     – Univariate analyses
     – Bivariate analyses and design-adjusted chi-square tests
     – Odds ratios and relative risks
     – Examples using available software


7. Linear Regression Analysis
     – Review of linear regression for simple random samples
     – Fitting linear regression models to complex samples
     – To weight or not to weight?
     – Examples using available software

8. Logistic Regression Analysis
     – Review of logistic regression for simple random samples
     – Fitting logistic regression models to complex samples
     – Examples using available software

9. Multilevel Modeling (time permitting)
     – Special considerations for fitting multilevel models to complex samples
     – Examples using available software