February 18, 2025

Surprises in data

When you get access to some data, a first step is to see if you understand it: do the variables measure what you expect, are there surprising values, and so on.  Often, you will be surprised by some of the results. Almost always this is because the data mean something a bit different from what you expected. Sometimes there are errors. Occasionally there is outright fraud.

Elon Musk and DOG-E have been looking at US Social Security data. They created a table of Social Security-eligible people not recorded as dead and noticed that (a) some of them were surprisingly old, and (b) the total added up to more than the US population.

That’s a good first step, as I said. The next step is to think about possible explanations (as Dan Davies says: “if you don’t make predictions, you won’t know when to be surprised”). The first two I thought of were people leaving the US after working long enough to be eligible for Social Security (like, for example, me) and missing death records for old people (the vital statistics records weren’t as good in the 19th century as they are now).

After that, the proper procedure is to ask someone or look for some documentation, rather than just to go with your first guess.  It’s quite likely that someone else has already observed the existence of records with unreasonable ages and looked for an explanation.

In this case, one would find (eg, by following economist Justin Wolfers) a 2023 report “Numberholders Age 100 or Older Who Did Not Have Death Information on the Numident” (PDF), a report by the Office of the Inspector General, which said that the very elderly ‘vampires collecting Social Security’ were neither vampires nor collecting Social Security, but were real people whose deaths hadn’t been recorded.   This was presumably a follow-up to a 2015 story where identity fraud was involved — but again, the government wasn’t losing money, because it wasn’t paying money out to dead people.

The excess population at younger years isn’t explained by this report, but again, the next step is to see what is already known by the people who spend their whole careers working with the data, rather than to decide  the explanation is the first thing that comes to mind.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Steve Curtis

    More ‘surprising’ to me is on another group there is 9.46 mill males under 5 compared to 9.05 mill females.
    Over 80s its the other way around, which is expected.

    3 days ago Reply

  • avatar
    Thomas Lumley

    That might be surprising, but it’s normal; the sex ratio at birth is just over 51% male and that’s very widely observed both internationally and across time.

    2 days ago Reply

    • avatar
      Steve Curtis

      Thanks . I knew there was a difference but my calculation of the sex ratio for 18.51 mill infants give 2.2% in favour of males . Seems surprisingly high.

      While with the seniors getting pensions. The Social Security data give almost 90,000 centenarians actually receiving the payments, not millions.

      15 hours ago Reply

  • avatar
    Megan Pledger

    The story I heard was that the Social Security system is written in COBOL. That the instruction is to write 0 if birthdate is unavailable, which in turn defaults to 1885 as the birthyear (like excel defaults to 1900). It doesn’t explain how people are older than 140 though.

    2 days ago Reply

    • avatar
      Thomas Lumley

      That’s a different story, which also happened in the past week.

      2 days ago Reply

    • avatar
      Steve Curtis

      The US Social security system began in 1935, initially as an age benefit/pension for those 65 and older. So those just turning 65 in 1935 or so would have been born in 1870 and their
      just issued unique social security number would now show them as being 155 yrs old if no date of death field was entered.

      15 hours ago Reply

Add a comment

First time commenting? Please use your real first name and surname and read the Comment Policy.