October 18, 2014

When barcharts shouldn’t start at zero

Barcharts should almost always start at zero. Almost always.

Randal Olson has a very popular post on predictors of divorce, based on research by two economists at Emory University. The post has a lot of barcharts like this one

marriage-stability-wedding-expenses

The estimates in the research report are hazard ratios for dissolution of marriage. A hazard ratio of zero means a factor appears completely protective — it’s not a natural reference point. The natural reference point for hazard ratios is 1: no difference between two groups, so that would be a more natural place to put the axis than at zero.

A bar chart is also not good for showing uncertainty. The green bar has no uncertainty, because the others are defined as comparisons to it, but the other bars do. The more usual way to show estimates like these from regression models is with a forest plot:

marriage

The area of each coloured box is proportional to the number of people in that group in the sample, and the line is a 95% confidence interval.  The horizontal scale is logarithmic, so that 0.5 and 2 are the same distance from 1 — otherwise the shape of the graph would depend on which box was taken as the comparison group.

Two more minor notes: first, the hazard ratio measures the relative rate of divorces over time, not the relative probability of divorce, so a hazard ratio of 1.46 doesn’t actually mean 1.46 times more likely to get divorced. Second, the category of people with total wedding expenses over $20,000 was only 11% of the sample — the sample is differently non-representative than the samples that lead to bogus estimates of $30,000 as the average cost of a wedding.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »