11 Underpowered Studies

We sometimes refer to studies as being “low-powered” or “underpowered.” What we mean by this is that the expected effect size is small in relation to the variation (e.g. standard error) of the measurement. Low power in studies might result from any number of reasons, such as the experiment involving too few experimental units (e.g. individual subjects), or from measurements having relatively high variability due to the effect of noisy nuisance variables.

Underpowered studies may not give a statistically significant result even when a relatively large effect is found in the experiment. Worse still, because a very large effect may be required to reach statistical significance, any “positive” result (rejecting the null hypothesis) is likely to be a statistical outlier and not reflect the true effect size.

Important

Even experienced scientists with many publications and long track records of research funding may never have received any formal training in experimental design or statistics. You will sometimes hear claims being made along the lines of:

“if the effects are large enough to be seen in a small study they must be real large effects”

Unfortunately, this is not true. “Statistically significant” results from underpowered studies are highly likely to be exaggerated, and maybe even suggest that the effect acts in the wrong direction! (Figure 11.1)

Figure 11.1: **When effect size is small compared to the standard error of the measurement, statistical power is low, and the study is underpowered.** In this figure, the null hypothesis is that there is no effect (orange dashed line). The real effect size of the intervention under study is assumed to be 2% of the size of the control (the purple solid line), and estimates are assumed to have the same standard error as the control experiment. The black curve represents the distribution of possible estimates of effect size as a Normal distribution with \(\mu=2\%, \sigma=10\%\). The shape of the curve shows that statistical significance is unlikely to be achieved (we would need a 19% effect size, with this level of variation) but, if it is achieved, it is misleading, as there is a reasonable chance (39%) that the significant result will be in the wrong direction and, in any case, the magnitude of the effect size will be greatly overestimated. (Adapted from Figure 16.1 in Gelman *et al.* “Regression and Other Stories”)

Strategies to increase statistical power

There are three broad strategies available to us, as researchers, to increase the statistical power of an experiment:

Reduce the variability in the experiment

this typically means controlling for nuisance effects that cause variation in experimental units, or improving the way we measure outcome variables

Increase the number of experimental units

when we increase the number of experimental units, we reduce the uncertainty in our estimate of error of the mean

Increase the effect size

this might not always be under researcher control but might be modulated by, say, changing the concentration of an administered drug