3 The Problem With Statistics
Causal inference attempts to put science before statistics, or at least to make statistics work to give us useful scientific outputs. The reason we need something like causal inference is this: the reasons for carrying out a statistical analysis are not present in the data.
What do we mean by this statement?
As an example, consider a statistical question where you have two sets of numbers: {10, 11, 12} and {14, 15, 16}, and you ask: are the averages of these two sets different from each other?
As you will recall from your introductory statistics courses, we can always calculate the means and standard deviations of the two sets of numbers, and perform a t-test to calculate a proability that we should reject a null hypothesis of the means being equal. So we can answer the question: “Are the means of these two sets of values statistically different from each other?”.
But we’re usually not asking that question in isolation. In an experiment, those numbers represent something in the real world, and in which we’re interested. We want to know what is the scientific insight that is gained? Maybe it’s something like:
- the number of people who vote in an election before and after a bribe is offered
- the number of mice that survive a condition, after some drug intervention
- reaction times in an individual before and after drinking alcohol
The point is, we can’t tell what the numbers represent, or what any differences mean, from the data alone! We cannot look at the numbers alone and extract the causes of the data.
“No causes in, no causes out.” - Nancy Cartwright, Cartwright (2011)
Statistical models are only devices that process data to provide estimates. To understand what the data are telling us, and for statistical analysis to give us scientific insight, we need additional information: scientific (causal) models.