Analysis of Cost Data

A 3-Day Remote Seminar Taught by
Henry Glick, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

Data on costs typically have distributions that differ dramatically from the normal distribution. They are usually highly skewed to the right, with long heavy tails and high kurtosis, often with a preponderance of zeros. These characteristics often lead to violations of assumptions underlying typical univariate and multivariable tests of means such as t-tests and multiple regression analysis (OLS). Both appropriate and inappropriate methods have been proposed to overcome these violations.

This seminar assesses a number of these methods for analyzing costs and enables researchers to evaluate which methods may be more or less appropriate for the analysis of cost data. We will cover:

  • Univariate statistics
  • OLS/Log OLS
  • Generalized linear models
  • Generalized estimating equations and Extended estimating equations
  • Models and methods for addressing missing data
  • Constructing the cost outcome
  • Addressing cost over time
  • Sample size and power

The style of instruction is designed for participants coming from a variety of different subject-matter backgrounds. Examples will be presented using the Stata software package.

Starting February 24, we are offering this seminar as a 3-day synchronous*, remote workshop. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each day will include a hands-on exercise to be completed on your own after the lecture session is over. An additional lab session will be held Thursday and Friday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.


Stata will be used for all worked examples, and Dr. Glick’s Stata programs will be distributed for a number of the topics discussed. Data used in the exercises will be made available in other formats (e.g., SAS), but there will be no support available for programing in these languages. You will still greatly benefit from the instruction, comprehensive set of slides, and software syntax that you can apply later. If you wish to try the exercises, you should use a computer with the basic Stata package installed. For all but one or two methods (e.g., extended estimating equations), add-ons are not needed.

WHO SHOULD Register? 

The course will benefit applied researchers, analysts, and students interested in enhancing their understanding of cost analysis and developing their application skills. Participants are assumed to have been exposed to introductory parametric statistics, such as that offered through an in-depth workshop or a typical university course.

Seminar Outline

Introduction to Cost analysis

What cost statistic should we estimate?
     • Welfare economic principles
Basic principles and univariate analysis
     • Role of:
             ◦ Parametric tests of difference in cost
             ◦ Nonparametric tests of other characteristics of cost distribution
                     ▪ Trade-offs between bias and skewness
             ◦ Transformation of data so that parametric tests’ assumptions are met
                     ▪ Problems with analysis of log (and other) transformations
             ◦ Tests of sample mean that avoid parametric assumptions
Basics of cost data
     • Generating the cost outcome
     • Addressing costs incurred at different times
             ◦ Inflation
             ◦ Discounting

Multivariable models for cost analysis

Generalized linear models
     • Role of link function
             ◦ Difference between log OLS and log link
     • Role of the family
     • Diagnosing appropriate links and families
             ◦ Pregibon link test, Pearson correlation test, Modified Hosmer and
               Lemeshow test, Modified Parks test, AIC, BIC
     • Observed vs predicted mean costs
     • Inconvenient truths
Other multivariable approaches
     • GEE and EEE

Analysis in the face of missing cost data

Missing data methods
     • Naïve methods
     • GLM with inverse probability weights
     • Linn ’97, Carrides regression method, multiple imputation
     • Population average maximum likelihood longitudinal panel data analyses
Sample size and power for cost and cost-effectiveness analysis

REVIEWS OF Analysis of Cost Data

“I highly recommend this in-depth course on how best to analyze cost data. A thorough series of seminars with excellent, comprehensible explanation of concepts, examples, and statistical analysis code.”
  Nikki McCaffrey, Deakin University

“The material was interesting and new for me. Dr. Glick was very accomodating and willing to spend time to reinforce concepts and provided additional lab time.”

“I appreciated the clear and thorough Stata codes provided. The whole course was very nice and clear, even without any previous knowledge about GLM (or cost analysis). Thank you!”