Analysis of Complex Survey Data

A 3-Day Remote Seminar Taught by
Ann A. O’Connell, Ed.D.

National and international sample surveys often use probability-based designs and complex sampling strategies to collect data on nearly all kinds of human and social phenomena and within every discipline. Complex sampling methods are used to optimize data quality and enhance population coverage while balancing issues of cost and feasibility. Statistical approaches to analyzing data from complex samples must adjust for the distinct sampling features used in the survey to ensure that statistics and models generated from the data will provide appropriate population estimates and inferences. 

While many software packages have the capacity to analyze data from complex sample surveys, there are often challenges for researchers and survey users in understanding and applying appropriate analysis methods. This course will equip researchers with the knowledge, tools, and techniques essential for understanding and analyzing data from complex sample surveys, thereby improving accessibility to and usefulness of these data sources for applied researchers.

Starting October 8, we are offering this seminar as a 3-day synchronous*, remote workshop for the first time. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. Participants are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if they are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional session will be held Thursday and Friday afternoons as an “office hour”, where participants can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for one week after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 


The seminar will begin with a discussion of the features of basic sampling designs and their use in complex samples for survey research, and the impact of sample characteristics on variance estimation and their relationship to sample design effects. Topics include differences between model-based and design-based estimation, concepts of sampling design and estimation for existing international and national surveys, and the use of weights. Some of the surveys we will use for demonstration and practice include the American Community Survey (ACS); National Longitudinal Study of Youth (NLSY); the National Health and Nutrition Examination Survey (NHANES); Program for International Student Assessment (PISA); and the Early Childhood Longitudinal Study (ECLS-K:2011). Hands-on use of selected data sets for visualization and descriptive statistics using R, Stata, and/or SPSS will be included. 

Next we will dive more deeply into modeling with data from complex sample surveys focusing on linear regression and logistic regression, models for counts and ordinal data, and a simple longitudinal analysis. Extensions for more complex analyses will be discussed. Adjustments or approaches needed for analyses of subsamples from large population-based surveys will be presented, along with issues of sample size and statistical power.


This remote seminar is held via Zoom, a free video conferencing application. Instructions for joining a session via Zoom are available here. Before the seminar begins, participants will receive an email with the meeting code and password you must use to join.

Several software options will be demonstrated throughout this course, including R, SPSS and Stata. Because R is freely available, we will use R for most of our empirical demonstrations and exercises. Comparisons across packages will be made, and code will be provided for all examples.

Participants are encouraged to use a laptop computer with the most recent versions of R and RStudio installed. R and RStudio can work on Windows, Mac, and Linux platforms.

During the workshop, “hands-on” time will be devoted to participant utilization and analysis of a selection of complex sample data sets. Examples will be balanced across educational, health, and social science data.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

Who should Register? 

Participants with a good working knowledge of regression analysis and a desire to learn about approaches to analysis of complex data will benefit from this course. An applied approach is emphasized, but statistical concepts, theoretical background and coverage of variance estimation methods will be presented to support the applications and analyses presented during the workshop. No prior knowledge regarding use of complex surveys is required.


  1. Theory and background of complex sampling methods
    1. Basic sampling designs
    2. Structure of complex samples
    3. Impact on variance estimation
    4. Review and description of existing national and international surveys
    5. Introduction to R
    6. Visualization and descriptive analysis
    7. Cross-walk of analyses in Stata and SPSS
  1. Analysis models for complex samples
    1. Linear and multiple linear regression models
    2. Logistic regression models
    3. Models for ordinal outcomes
    4. Models for counts
    5. Power and sample size
    6. Public-use versus restricted-access datasets
    7. Presenting your results of analyses from complex samples