Univariate (1D) Scatterplots

Representing continuous data with univariate scatterplots

In a univariate scatterplot, datapoints are plotted against a single numerical axis, with the other axis showing the identity of the dataset. This allows all values in a dataset to be shown, which is the most transparent way of presenting the data to the reader.

Where there is potential for overlap of datapoints, their locations can be “jittered” - moved slightly in a horizontal and/or vertical direction - in order to make the number of datapoints at each value clearer. It is however not always possible to interpret the density of datapoints very well, so this representation is best combined with a summary of the distribution that conveys this information also.

Figure 1: Univariate (1D) scatterplot of sepal length for each species from the iris dataset, with jitter.

A univariate scatterplot for the iris sepal length data in Figure @ref(fig:1d-scatterplot) shows the distributions of this variable overlap for each species, and that there is a possible outlier in the I. virginica data. The nature of the data is also evident: the datapoints are regularly-spaced, and we can see where there are relatively few or many datapoints.