Creating Publication-Quality Graphics in R

Leighton Pritchard

University of Strathclyde

2025-11-04

1. Summary and Setup

Software and files

Figure 1: Link to download gapminder data

2. Packages

Packages

R packages

  • a collection (or library) of reusable code
  • many useful and specialist tools are distributed as packages
  • over 10,000 packages are available at CRAN
  • you can distribute your own code as a package

INTERACTIVE DEMO

installed.packages()               # see installed packages
install.packages("packagename")    # install a new package
update.packages()                  # update installed packages
library(packagename)               # import a package for use in your code

Challenge (2min)

Check if the following packages are installed on your system, and install them if necessary.

dplyr
ggplot2
knitr

3. Creating publication-quality graphics in R

Visualisation is Critical!

Figure 2: How scientists see the same data may vary by field

The Grammar of Graphics

What is ggplot2?

  • part of the Tidyverse, a collection of packages for data science
    • ggplot2 is the graphics package
  • Implements the “Grammar of Graphics”
    • Separates data from its representation
    • Helps iteratively update/refine plots
    • Helps build complex, effective visualisations from simple elements

Four pillars of ggplot2

  • data
  • aesthetics
  • geoms
  • layers

A Basic Scatterplot

You can use ggplot2 like R’s base graphics

qplot()plot()

INTERACTIVE DEMO

library(ggplot2)
plot(gapminder$lifeExp, gapminder$gdpPercap, col=gapminder$continent)
qplot(lifeExp, gdpPercap, data=gapminder, colour=continent)
Figure 3: Scatterplot using base graphics
Figure 4: Scatterplot using ggplot2

What is a Plot? 1

Plot aesthetics

  • Each observation in the data is a point
  • A point’s aesthetics determine how it is rendered
    • co-ordinates on the image; size; shape; colour
  • aesthetics can be constant or mapped to variables
  • Many different plots can be generated from the same data by changing aesthetics
Figure 5: Scatterplot generated with qplot()

What is a Plot? 2

Plot geoms

  • geom (short for geometry) defines the “type” of representation
    • If data are drawn as points: scatterplot
    • If data are drawn as lines: line plot
    • If data are drawn as bars: bar chart

Figure 6: Some ggplot2 geom types

What is a Plot? 3

INTERACTIVE DEMO

ggplot2 offers several different geom types

# Generate plot of GDP per capita against life Expectancy
p <- ggplot(data=gapminder, aes(x=lifeExp, y=gdpPercap, color=continent))
p + geom_point()
p + geom_line()

Challenge (2min)

Create a scatterplot showing how life expectancy changes as a function of time

p <- ggplot(data=gapminder, aes(y=lifeExp, x=year, color=continent))
p

What is a Plot? 4

We’ve just used another Grammar of Graphics concept: layers

  • ggplot2 plots are built as layers
  • All layers have two components
    1. data and aesthetics
    2. a geom

What is a Plot? 5

Data and aesthetics can be defined in a base ggplot object

  • values from the base are inherited by the other layers
  • the base defaults can be overridden in other layers
p <- ggplot(data=gapminder, aes(x=lifeExp, y=gdpPercap, colour=continent))
p + geom_point()

Figure 7: Defining aesthetics in a ggplot2 base layer

What is a Plot? 6

Data and aesthetics can be defined in a base ggplot object

  • values from the base are inherited by the other layers
  • the base defaults can be overridden in other layers

Figure 8: Overriding aesthetics of a ggplot2 base layer

INTERACTIVE DEMO

What is a Plot? 7

We use several layers of geoms to build up a plot

  • alpha controls opacity for a layer
p <- ggplot(data=gapminder, aes(x=lifeExp, y=gdpPercap, color=continent))
p + geom_line(aes(group=country)) + geom_point(alpha=0.4)

Figure 9: ggplot2 figures are built by adding layers

INTERACTIVE DEMO

Challenge (5min)

Create a figure showing how life expectancy changes as a function of time

  • Colour datapoints by continent, and use two layers:
    • a line plot, grouping points by country
    • a scatterplot showing each data point, with 35% opacity
p <- ggplot(data=gapminder, aes(x=year, y=lifeExp, color=continent))
p + geom_line(aes(group=country)) + geom_point(alpha=0.35)

Transformations and scales

Data transformations are handled with scale layers

  • axis scaling (log scales)
  • colour scaling (changing palettes)

Figure 10: Axis scales, palette scales, etc. are added as layers

INTERACTIVE DEMO

Statistics layers

Some geom layers transform the dataset

  • Usually this is a data summary (e.g. smoothing or binning, or fitting a model)

Figure 11: Statistical transformations are also added as layers

INTERACTIVE DEMO

Multi-panel figures

Comparisons can be clearer with multiple panels

  • facets
  • “small multiples plots”

Use the facet_wrap() layer to generate grids of plots

INTERACTIVE DEMO

# Compare life expectancy over time by continent
p <- ggplot(data=gapminder, aes(x=year, y=lifeExp, colour=continent,
                                group=country))
p <- p + geom_line() + scale_y_log10()
p + facet_wrap(~continent)

Challenge (5min)

Create a scatterplot and contour densities of GDP per capita against population size

  • Fill datapoint colour by continent?

ADVANCED CHALLENGE

Transform the x axis to better visualise data spread, and use facets to panel density plots by year.

p <- ggplot(data=gapminder, aes(x=pop, y=gdpPercap))
p <- p + geom_point(alpha=0.8, aes(color=continent))
p <- p + scale_y_log10() + scale_x_log10()
p + geom_density_2d(alpha=0.5) + facet_wrap(~year)

4. Dynamic Reports

Literate Programming

What is literate programming?

  • A programming paradigm introduced by Donald Knuth
  • The program (or analysis) is explained in natural language
    • The source code is interspersed
  • The whole document is executable

R Markdown

R Markdown files embody Literate Programming

  • File \(\rightarrow\) New File \(\rightarrow\) R Markdown
  • Enter a title
  • Save the file (gets the extension .Rmd)
Figure 12: R Markdown dialogue box
Figure 13: Starting an R Markdown file

Components of R Markdown 1

Header information

  • File metadata
  • Header is fenced by ---
---
title: "Literate Programming"
author: "Leighton Pritchard"
date: "04/11/2025"
output: html_document
---

Natural language

  • Natural language is written as plain text
This is an R Markdown document. Markdown is a simple formatting syntax

Components of R Markdown 2

Executable R code

  • Executable R code is fenced by backticks (```)

Click on Knit

Creating a Report

INTERACTIVE DEMO

  • We’re going to create an R Markdown report on the gapminder data

Figure 14: R Markdown output

12. Conclusion

You have learned:

  • About R, RStudio and how to set up a project
  • How to load data into R and produce summary statistics and plots with base tools
  • All the data types in R, the most important data structures
  • How to install and use packages
  • How to use the Tidyverse to manipulate and plot data
  • How to create dynamic reports in R

WELL DONE!!

The End Is The Beginning

Figure 15: The journey is long, but you’re making progress
Figure 16: Experience is no way to avoid failure

Where Next?

Figure 17: R is a rollercoaster