Power Calculations and Sample Size
In this section, we will consider the concept of statistical power, and learn how to calculate sample sizes with a widely-used software tool called G*Power
Statistical power is the probability, before a study is performed, that a particular comparison will achieve “statistical significance” at some predetermined level.
More precisely, statistical power is the probability that the study will not return a false negative result, known in statistics jargon as a “Type II error”.
There are two decisions we need to make as researchers, if we want to design an experiment to have a certain statistical power:
- What threshold of statistical significance (p-value, or \(\alpha\)) is appropriate?
- What rate of false negative results (“type II errors”) is acceptable to us at that significance level? (i.e. what statistical power do we want?)
These decisions are in your hands as the researcher although, as we will see, there may be constraints from outside sources, such as funding bodies.
Sample size is the number of experimental units (often, but not always, individuals in this context) used per group in the experiment.
The sample size for an experiment involving animals should be informed by a statistical power calculation and chosen such that the experiment is neither underpowered nor overpowered.
There is no universally applicable sample size. An appropriate sample size should be calculated specifically for each experiment.
We can use power calculations to determine appropriate sample sizes. To do this we also need to make additional assumptions (or, if you prefer, “propose hypotheses”) about the future behaviour of data that has not yet been collected:
- The desired effect size (e.g. a change in a value that is biologically meaningful, or a minimum level at which a change of value is recognised to be a change in the system)
- The variation we expect to see in the measured outcomes (e.g. as a standard deviation)