Better Tools Than Bar Charts – BM432 Data Visualisation Workshop

Case Study

Over the course of a small study, you attempted to evaluate the effect of a drug on four different matched cohorts of 11 individuals, specifying dosage per individual and measuring the drug effect.

The interactive examples below ask you to visualise alternative representations of this data.

Note

In this dataset, cohort is a categorical variable, but dosage and effect are continuous variables.

Click to view the complete dataset as a table

1 Visualising numerical datasets

Challenge

In the interactive examples below, you can select alternative visualisations of the case study dataset, and decide for yourself which presents the most appropriate story about the data.

Figure 1: Dosage for each cohort (bar chart)

Figure 2: Effect for each cohort (bar chart)

Figure 3: Dosage for each cohort (bar chart with error bars representing standard deviation)

Figure 4: Effect for each cohort (bar chart with error bars representing standard deviation)

Figure 5: Dosage for each cohort (1D scatterplot)

Figure 7: Dosage for each cohort (boxplot)

Figure 8: Effect for each cohort (boxplot)

Figure 9: Dosage for each cohort (violin plot)

Figure 10: Effect for each cohort (violin plot)

Figure 11: Dosage for each cohort (bar chart, standard deviation error bars, overlaid 1D scatterplot)

Figure 12: Effect for each cohort (bar chart, standard deviation error bars, overlaid 1D scatterplot)

Figure 13: Dosage for each cohort (box plot, overlaid 1D scatterplot)

Figure 14: Effect for each cohort (box plot, overlaid 1D scatterplot)

Figure 15: Dosage for each cohort (violin plot, overlaid 1D scatterplot)

Figure 16: Effect for each cohort (violin plot, overlaid 1D scatterplot)

Questions

Did any of the visualisations give a good summary account of the data, and why did you think so?
Did any of the visualisations give a poor summary account of the data, and why did you think so?
If someone presented a bar chart as a summary of measurements in a dataset, would you think that was a reliable visual representation?
What kinds of problems in these datasets were disguised by each visualisation approach?

Hints

Does the visualisation describe the data, or only a summary of the data?
Does the visualisation let you easily tell the difference between two datasets with similar summaries, but different data?
Does the visualisation introduce elements that imply the presence of data which is not in the dataset?

2 Visualising correlations

Data Analysis

In the interactive window below, you can select alternative visualisations of the case study dataset, and decide for yourself which presents the most appropriate story about the data.

Figure 17: Linear regression of effect against dosage, for each cohort.

Figure 18: Regression plot of effect against dosage, for each cohort, with ribbon indicating 95% confidence interval for the fitted line.

Figure 19: Scatterplot of effect against dosage for each cohort.

Figure 20: Regression plot of effect against dosage for each cohort, with overlaid scatterplot.

Figure 21: Regression plot of effect against dosage for each cohort, with 95% confidence interval ribbon and scatterplot

Questions

Do the parameters or correlation coefficients of the fitted linear regressions differ by cohort?
Do the linear regressions give a good account of the relationship between dosage and effect, in each case?
Does the plotted uncertainty in the linear regression capture the variation between the real data, and linear regression?
Are there any signs of systematic problems in the data?
What kind of visualisation is most helpful to understand these datasets?

Hints

Does the regression fully describe the data, or is it only a summary of the data?
Does the uncertainty in the regression adequately describe the variation of the data with respect to the fitted line?
Which visualisations best allow you to see differences between the datasets?