Analysis of Cost Data

A 4-Day Remote Seminar Taught by
Henry Glick, Ph.D.


Data on costs typically have distributions that differ dramatically from the normal distribution. They are usually highly skewed to the right, with long heavy tails and high kurtosis, often with a preponderance of zeros. These characteristics often lead to violations of assumptions underlying typical univariate and multivariable tests of means such as t-tests and multiple regression analysis (OLS). Both appropriate and inappropriate methods have been proposed to overcome these violations.

This seminar assesses a number of these methods for analyzing costs and enables researchers to evaluate which methods may be more or less appropriate for the analysis of cost data. We will cover:

  • Univariate statistics
  • OLS/Log OLS
  • Generalized linear models
  • Generalized estimating equations and Extended estimating equations
  • Models and methods for addressing missing data
  • Constructing the cost outcome
  • Addressing cost over time
  • Sample size and power

The style of instruction is designed for participants coming from a variety of different subject-matter backgrounds. Examples will be presented using the Stata software package.

Starting July 20, we are offering this seminar as a 4-day synchronous*, remote workshop. Each day will consist of a 3-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional lab session will be held Tuesday and Thursday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for two weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.


COMPUTING

Stata will be used for all worked examples, and Dr. Glick’s Stata programs will be distributed for a number of the topics discussed. Data used in the exercises will be made available in other formats (e.g., SAS), but there will be no support available for programing in these languages. You will still greatly benefit from the instruction, comprehensive set of slides, and software syntax that you can apply later. If you wish to try the exercises, you should use a computer with the basic Stata package installed (for all but one or two methods [e.g., extended estimating equations] add-ons are not needed).


WHO SHOULD Register? 

The course will benefit applied researchers, analysts, and students interested in enhancing their understanding of cost analysis and developing their application skills. Participants are assumed to have been exposed to introductory parametric statistics, such as that offered through an in-depth workshop or a typical university course.


Seminar Outline

Day 1. Introduction to Cost analysis

What cost statistic should we estimate?

  • Welfare economic principles

Basic principles and univariate analysis

  • Role of:
    • Parametric tests of difference in cost
    • Nonparametric tests of other characteristics of cost distribution
      • Trade-offs between bias and skewness
    • Transformation of data so that parametric tests’ assumptions are met
      • Problems with analysis of log (and other) transformations
    • Tests of sample mean that avoid parametric assumptions

Basics of cost data

  • Generating the cost outcome
  • Addressing costs incurred at different times
    • Inflation
    • Discounting

Days 2 and 3: Multivariable models for cost analysis

OLS/log OLS

Generalized linear models

  • Role of link function
    • Difference between log OLS and log link
  • Role of the family
  • Diagnosing appropriate links and families
    • Pregibon link test, Pearson correlation test, Modified Hosmer and Lemeshow test, Modified Parks test, AIC, BIC
  • Observed vs predicted mean costs
  • Inconvenient truths

Other multivariable approaches

  • GEE and EEE

Day 4. Analysis in the face of missing cost data

Missing data methods

  • Naïve methods
  • GLM with inverse probability weights
  • Linn ’97, Carrides regression method, multiple imputation
  • Population average maximum likelihood longitudinal panel data analyses

Sample size and power for cost and cost-effectiveness analysis