Analysis of Complex Survey Data

A 2-Day Seminar Taught by Ann A. O’Connell, Ed.D.

National and international sample surveys often use probability-based designs and complex sampling strategies to collect data on nearly all kinds of human and social phenomena and within every discipline. Complex sampling methods are used to optimize data quality and enhance population coverage while balancing issues of cost and feasibility. Statistical approaches to analyzing data from complex samples must adjust for the distinct sampling features used in the survey to ensure that statistics and models generated from the data will provide appropriate population estimates and inferences. 

While many software packages have the capacity to analyze data from complex sample surveys, there are often challenges for researchers and survey users in understanding and applying appropriate analysis methods. This course will equip researchers with the knowledge, tools, and techniques essential for understanding and analyzing data from complex sample surveys, thereby improving accessibility to and usefulness of these data sources for applied researchers.

Our first day will cover features of basic sampling designs and their use in complex samples for survey research, and the impact of sample characteristics on variance estimation and their relationship to sample design effects. Topics include differences between model-based and design-based estimation, concepts of sampling design and estimation for existing international and national surveys, and the use of weights. Some of the surveys we will use for demonstration and practice include the American Community Survey (ACS); National Longitudinal Study of Youth (NLSY); the National Health and Nutrition Examination Survey (NHANES); Program for International Student Assessment (PISA); and the Early Childhood Longitudinal Study (ECLS-K:2011). Hands-on use of selected data sets for visualization and descriptive statistics using R, Stata, and/or SPSS will conclude the first day. 

Our second day will dive more deeply into modeling with data from complex sample surveys focusing on linear regression and logistic regression, models for counts and ordinal data, and a simple longitudinal analysis. Extensions for more complex analyses will be discussed. Adjustments or approaches needed for analyses of subsamples from large population-based surveys will be presented, along with issues of sample size and statistical power.


Several software options will be demonstrated throughout this course, including R, SPSS and Stata. Because R is freely available, we will use R for most of our empirical demonstrations and exercises. Comparisons across packages will be made, and code will be provided for all examples.

Participants are encouraged to bring a laptop computer with the most recent versions of R and RStudio installed. R and RStudio can work on Windows, Mac, and Linux platforms.

On both days during the workshop, “hands-on” time will be devoted to participant utilization and analysis of a selection of complex sample data sets. Examples will be balanced across educational, health, and social science data.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

Who should attend? 

Participants with a good working knowledge of regression analysis and a desire to learn about approaches to analysis of complex data will benefit from this course. An applied approach is emphasized, but statistical concepts, theoretical background and coverage of variance estimation methods will be presented to support the applications and analyses presented during the workshop. No prior knowledge regarding use of complex surveys is required.

LOCAtion, Format, And Materials 

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Temple University Center City, 1515 Market Street, Philadelphia, PA 19103. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995 includes all course materials. The early registration fee of $895 is available until May 26.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions 

A block of guest rooms has been reserved at the Club Quarters Hotel, 1628 Chestnut Street, Philadelphia, PA at a special rate of $164 per night. This location is about a 5-minute walk to the seminar location. In order to make reservations, call 203-905-2100 during business hours and identify yourself by using group code STH624 or click here. For guaranteed rate and availability, you must reserve your room no later than Monday, May 25, 2020.

If you need to make reservations after the cut-off date, you may call Club Quarters directly and ask for the “Statistical Horizons” rate (do not use the code or mention a room block) and they will try to accommodate your request.


  1. Theory and background of complex sampling methods
    1. Basic sampling designs
    2. Structure of complex samples
    3. Impact on variance estimation
    4. Review and description of existing national and international surveys
    5. Introduction to R
    6. Visualization and descriptive analysis
    7. Cross-walk of analyses in Stata and SPSS
  1. Analysis models for complex samples
    1. Linear and multiple linear regression models
    2. Logistic regression models
    3. Models for ordinal outcomes
    4. Models for counts
    5. Power and sample size
    6. Public-use versus restricted-access datasets
    7. Presenting your results of analyses from complex samples