23 Estimate Sample Size

Our final design diagram is complete, with no critique errors, and an analysis suggestion of one-way ANOVA with blocking factor(s). We can now use the EDA’s Sample Size Calculation tool to estimate the required number of individuals per group, to achive a certain level of statistical power¹.

Important

In order to be able to estimate a suitable sample size, the tool needs the researcher to provide realistic values for:

The effect size that the researcher wishes to detect in the comparison. This might be the size of effect considered “clinically relevant”, or assumed to be comparable to some previous treatment.
The variability of the outcome measure being compared. Depending on how the analysis is being performed, this might be the average standard deviation across all measurements, or the expected variation (standard deviation) in the measured difference. These values typically come from previous experiments.
The significance level of the hypothesis test, representing the threshold at which the null hypothesis would be discarded ².
The desired power of the experiment, i.e. the probability with which, if there is an effect of at least the stated effect size to be found, it will reach the stated significance level ³.
The sidedness of the test. Generally, if the experiment aims to test for a difference but the direction of the difference is not known (the treatment may increase or decrease the size of a tumour, say), a two-sided test would be used. But if the direction of the desired effect is known (e.g. we wish to detect whether a treatment decreases the size of a tumour) a one-sided test should be used ⁴.

Tip

The EDA site has an extensive introduction to the concepts supporting power calculations.

Something else no-one tells you

As noted in Chapter 20 and Chapter 21, even experienced scientists with many publications and long track records of research funding may never have received any formal training in experimental design or statistics. As a consequence they can propagate poor or suboptimal advice that may not be correct. As noted by Simon Bate on the NC3Rs site:

I often hear “we always use n=6” or “the other researchers use n=6, so shall we”. Needless to say, this is not a recommended approach. You cannot simply copy someone else’s sample sizes; you need to assess the variability of data generated in your lab, under your experimental conditions and using your protocols. Even if your supervisor or manager says “use n=6”, make sure you question them and check it is suitable!

Ultimately, as a researcher you are responsible for the rigour and ethics of your own actions. It is always advisable to consult with a statistician, rather than trust the word of senior scientists alone.

Don’t be this guy (Figure 23.1).

Figure 23.1: How not to collaborate with a biostatistician, by JavaMama926.

23.1 Run the Sample Size Calculation Tool

Click on Tools -> Sample Size Calculation in the menu bar (Figure 23.2) to bring up the Power Calculation dialogue box (Figure 23.3).

Figure 23.2: The EDA design tool menu, showing the location of the Sample Size Calculation tool.

Figure 23.3: The EDA Sample Size Calculation tool dialogue box. By default fields for an unpaired t-test are shown

Use the NC3Rs EDA decision tree to choose the appropriate power calculation (Figure 23.4).

Figure 23.4: The EDA decision tree for choosing the appropriate power calculation.

Using the decision tree

Starting at the top of the tree, we are asked a series of questions, as we progress through the flowchart:

Are you planning on analysing your data with a parametric analysis?

Yes we are, since ANOVA is a parametric analysis.

Does each animal receive multiple treatments, acting as its own control?

No. Each animal receives a single treatment and there are separate control and treatment groups.

Does your experiment only involve two groups.

Technically, this is true. We are interested in a single main comparison involving two groups (control and treatment). We have a further two groups male and female as a blocking factor but, because this is only a blocking factor and not a main comparison, we ignore sex for the purposes of the power calculation.

The decision tree leads us to the box containing the advice to Use power calculation for an unpaired t-test, with variability as the average standard deviation of the outcome measurement being made.

Ensure the power calculation dialogue for the unpaired t-test is selected (as in Figure 23.3).
Decide on an expected effect size that is biologically relevant. For the purpose of this workshop, let’s assume that we need to see a decrease in blood glucose of at least 2mM/L. Enter the value 2 into the Effect size field.
Decide on an expected variability in the outcome measure. This is usually determined from previous studies, but here we will assume that variability is 1mM/L, for the sake of the workshop. Enter the value 1 into the Variability field.
Decide on a significance level for your statistical hypothesis test. This is under researcher control, and we will keep it at 0.05 for the purpose of the workshop.
Decide on a suitable power (probability of avoiding a false negative error) for the statistical test. This is under researcher control, and power values of 0.8-0.9 are usually acceptable to funders. We will keep this value at 0.9 for the purpose of the workshop.
Decide whether this will be a one- or two-sided test. As we have specified that we are expecting to see a decrease in blood glucose for a positive outcome, a one-sided t-test is appropriate and will give increased power for the same number of experimental subjects. Set the One or two-sided test field to 1 ⁵ (Figure 23.5).

Figure 23.5: The EDA Sample Size Calculation tool dialogue box, populated with values for our power calculation.

Click on the Calculate N per group button to have the Sample Size Calculation tool return an estimated sample size for the desired power (?fig-sample-size-5).

The EDA Sample Size Calculation tool returns an estimated sample size of six individuals per group, given our input choices. 10. Populate the “Control” Group node with the recommended number of experimental units, using the Node properties dialogue box. Enter the number of experimental units, the number of animals, and add a justification for the choice of sample size (Figure 23.6). Click on Close to dismiss the dialogue.

Figure 23.6: The Node properties settings for the “Control” Group node, including the estimated sample size, and enough information to enable reproduction of the power calculation.

Populate the “Treatment” Group node with the recommended number of experimental units, using the Node properties dialogue box. Enter the number of experimental units, the number of animals, and add a justification for the choice of sample size.Click on Close to dismiss the dialogue.

23.2 Rename and Save the Diagram

Change the name of the diagram by clicking on the text Untitled Diagram at the top, to bring up a dialogue box (Figure 23.7). Rename the diagram as “MP968_Workshop”, and click the OK button.

Figure 23.7: Rename the diagram using the tool’s dialog box.

Save the experiment by clicking on the Save (floppy disk) icon at the top left of the tool (Figure 23.8).

Figure 23.8: Click on the floppy disk icon to save the design to your experiments at EDA.

The sample size calculator in NC3Rs EDA is adequate for many experiments, but you may encounter more complex designs that are beyond its capabilities. In those cases you might use a tool like G*Power (see Chapter 13).↩︎
Note that discarding the null hypothesis is not typically confirmatory of any specific alternative hypothesis.↩︎
Suppose the effect size is 0.5, and the significance level is 0.05. An experiment with power of 0.8 would detect a true effect size of 0.5, at \(P \leq 0.05\) in approximately 80 out of every 100 times the experiment was run, assuming all statistical test assumptions are met.↩︎
One-sided tests offer greater statistical power, for the same sample size, if the expected direction of change is known.↩︎
The choice depends on our null hypothesis. If our null hypothesis is “drug A has no effect” we should use a two-sided t-test. If our null hypothesis is “drug A does not reduce blood glucose” we should use a one-sided t-test. The choice of null hypothesis is under researcher control.↩︎