Data Visualization with ggplot2

From Points and Bars to Boxplots and Maps in ggplot2

Bogdan G. Popescu

John Cabot University

Learning Outcomes

By the end of this lecture, you will be able to:

  • Understand the logic of the Grammar of Graphics.
  • Create plots in R using ggplot2.
  • Choose the right geom for your data:
    • points
    • bars
    • lines
    • boxplots
    • density plots
    • maps

Purpose of Visuals

Why Visuals?

Numbers vs Visuals

Would you rather read this?

Country LifeExpectancy
Monaco 77.25278
Andorra 77.04861
San Marino 76.88056
Guernsey 76.71944
Israel 75.49722

…or see this?

Why Visuals Matter

Visuals are powerful because they:

  • Reveal patterns we miss in tables
  • Help us explain results clearly
  • Persuade others more effectively

Intro to ggplot2

The ggplot2 library in R

R can help us achieve great visualizations with the help of ggplot2

The ggplot2 library in R

R can help us achieve great visualizations with the help of ggplot2

The ggplot2 library in R

R can help us achieve great visualizations with the help of ggplot2

The Essentials

Grammar of Graphics

  • At the most basic level we have:

    • data
    • geometries
    • aesthetics
  • Optional layers include:

    • facets
    • coordinates
    • annotations
    • themes, etc.
  • We can add them together in ggplot() with +

Data Cleaning

Preparing the Data

Before we use ggplot, our data has to be tidy. This means:

  • Each variable is a column
  • Each observation has its own row
  • Each value has its own cell

Simple Geoms

Simple Geoms

Geometries

Simple Geoms

Geometries

  • Geoms are the “shapes” your data takes on a graph.

  • Example:

    • points (geom_point) make a scatterplot
    • bars (geom_bar) make a bar chart
    • lines (geom_line) make a line graph.

Simple Geoms

Examples

geom_point

geom_col

geom_bar

geom_text

geom_line

Simple Geoms

Reference

geom_point

geom_col

geom_bar

geom_text

geom_line
geom Use for
geom_point() Relationships between two variables; at least 10 obs.
geom_col() Totals/percent per category; ordered bars. Uses pre-computed values for bar height. You must supply both x and y
geom_bar() Counts the observations in each category. You must supply x, not y
geom_text() Direct labels for small N; annotate outliers
geom_line() Relationships between two variables; at least 10 obs.

Simple Geoms

geom_point

Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).

eg1 <- data.frame(
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),   # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27)  # GDP per capita in $1000s
)

Simple Geoms

geom_point

Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).

democracy gdp
-8 2
-7 9
-5 4
-3 7
0 8

Simple Geoms

geom_point

Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).

Show the code
# Scatterplot with geom_point
ggplot(data=eg1, 
       aes(x = democracy, y = gdp)) +
  geom_point()

Simple Geoms

geom_point

Interpretation: Each dot shows one country’s democracy score (left to right) and its GDP per person (up and down). The dots suggest that more democratic countries usually have higher GDP per person.

Show the code
# Scatterplot with geom_point
ggplot(eg1, aes(x = democracy, y = gdp)) +
  geom_point()

Simple Geoms

geom_col

Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.

# Pre-aggregated counts instead of raw rows
eg2 <- data.frame(
  education = c(
    "Primary", "Primary", "High School", "High School", "High School",
    "College", "College", "College", "College"
  ))
education
Primary
Primary
High School
High School

Simple Geoms

geom_col

Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.

library(dplyr)
# Count occurrences of each education level
eg2_transformed <- eg2 %>%
  count(education)
education n
College 4
High School 3
Primary 2

Simple Geoms

geom_col

Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.

Show the code
# Drawing the column
ggplot(eg2_transformed, 
       aes(x = education, y = n)) +
  geom_col()

Simple Geoms

geom_bar

Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.

Show the code
# Drawing the bar
ggplot(eg2, 
       aes(x = education)) +
  geom_bar()

What is the difference?

geom_col vs. geom_bar

This is the difference:

# Drawing the column
ggplot(eg2_transformed, 
       aes(x = education, y = n)) +
  geom_col()
# Drawing the bar
ggplot(eg2, 
       aes(x = education)) +
  geom_bar()

Simple Geoms

geom_text

Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.

Show the code
# Drawing the column
ggplot(eg2_transformed, aes(x = education, y = n)) +
  geom_col() +
  geom_text(
    aes(label = education), y = n/2)

Simple Geoms

geom_text

We can also use geom_text with the first example:

Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).

eg1 <- data.frame(
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),   # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27),  # GDP per capita in $1000s
  country = c(
    "North Korea",   # very autocratic, very poor
    "Saudi Arabia",  # autocratic, but richer due to oil
    "Zimbabwe",      # authoritarian, low GDP
    "Russia",        # hybrid regime, middle income
    "Nigeria",       # similar position
    "India",         # low–mid democracy, growing GDP
    "Brazil",        # democracy, mid GDP
    "Poland",        # consolidated democracy, higher GDP
    "South Korea"   # rich democracy
  ))

Simple Geoms

geom_text

We can also use geom_text with the first example:

Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).

democracy gdp country
-8 2 North Korea
-7 9 Saudi Arabia
-5 4 Zimbabwe
-3 7 Russia
0 8 Nigeria

Simple Geoms

geom_text

We can also use geom_text with the first example:

Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).

Show the code
# Scatterplot with geom_point
ggplot(eg1, aes(x = democracy, y = gdp)) +
  geom_point() +
  geom_text(aes(label = country), vjust = -1)

Simple Geoms

geom_line

Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.

# Toy dataset
eg3 <- data.frame(
  year = c(2000, 2004, 2008, 2012, 2016, 2020),
  turnout = c(55, 58, 62, 60, 59, 65))

Simple Geoms

geom_line

Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.

Show the code
# Drawing the time line
ggplot(eg3, aes(x = year, y = turnout)) +
  geom_line()

Complex Geoms

Complex Geoms

Examples

geom_boxplot

geom_histogram

geom_density

geom_violin

geom_smooth

geom_errorbar

geom_sf

Complex Geoms

Reference

geom_boxplot

geom_histogram

geom_density

geom_violin

geom_smooth

geom_errorbar

geom_sf
geom Use for
geom_boxplot(). Compare distributions across groups
geom_histogram() Distribution of a single continuous variable (frequency bins)
geom_density() Smoothed distribution of a single continuous variable
geom_violin() Visualizing distributions across categories (shape of data, not just box)
geom_smooth() Adding fitted trend/smoothed lines (LOESS, GAM, linear, etc.)
geom_errorbar() Showing uncertainty or variability (e.g., CI ranges around a mean)
geom_sf() Plotting spatial (map) data stored as sf objects

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

set.seed(123)

# Toy survey dataset
eg4 <- data.frame(
  gender = rep(c("Men", "Women"), each = 20),
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)))

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to see the typical values and how spread out the answers are.

gender trust
Men 3.879049
Men 4.539645
Men 8.117417
Men 5.141017
Men 5.258576
Men 8.430130

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to see the typical values and how spread out the answers are.

Show the code
# Drawing the boxplot
ggplot(eg4, aes(y = trust)) +
  geom_boxplot()

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to see the typical values and how spread out the answers are.

Show the code
ggplot(eg4, aes(y = trust)) +
  geom_boxplot()

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

set.seed(123)

# Toy survey dataset
eg5 <- data.frame(
  gender = rep(c("Men", "Women"), each = 20),
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)))

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

gender trust
Men 3.879049
Men 4.539645
Men 8.117417
Men 5.141017
Men 5.258576
Men 8.430130
Men 5.921832
Men 2.469877

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

Show the code
ggplot(eg5, aes(x = gender, y = trust)) +
  geom_boxplot()

Complex Geoms

geom_boxplot

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

Show the code
ggplot(eg5, aes(x = gender, y = trust)) +
  geom_boxplot()

Complex Geoms

geom_histogram: What is a Histogram?

Histograms allow us to better understand how frequently or infrequently certain values occur in our dataset.

Imagine a set of values that are spaced out along a number line.

Complex Geoms

geom_histogram: What is a Histogram?

To construct a histogram, a section of the number line is divided into equal chunks, called bins.

Next, count how many data points sit inside each bin, and draw bars, one for each bin

The heights of the bars correspond to the number of data points.

Complex Geoms

geom_histogram: What is a Histogram?

Label the data (in the example below each data point is an SAT score)

Draw in a y-axis which counts the number of data points in each bin

Finally label your bins.

Complex Geoms

geom_histogram: Example

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.

Show the code
# Drawing the Histogram
ggplot(eg5, aes(x = trust)) +
  geom_histogram(bins = 10, color = "white")+
  coord_cartesian(xlim = c(2, 8))

Complex Geoms

geom_histogram: Example

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.

There is a direct connection between histograms and boxplots:

Complex Geoms

geom_histogram: Example

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.

There is a direct connection between histograms and boxplots:

Complex Geoms

geom_histogram: Example

One specific aspect pertaining to histograms is the number of bins:

  • too wide bins hide structure;
  • too narrow bins are noisy..

Complex Geoms

geom_density: What is a Density Curve?

A histogram counts how many students fall in each trust “bucket.”

Complex Geoms

geom_density: What is a Density Curve?

A histogram counts how many students fall in each trust “bucket.”

A density curve smooths those buckets into a single curve so we see the overall shape.

Complex Geoms

geom_density: What is a Density Curve?

A histogram counts how many students fall in each trust “bucket.”

A density curve smooths those buckets into a single curve so we see the overall shape.

The area under the entire curve = 1 (i.e., 100% of people).

Complex Geoms

geom_density: What is a Density Curve?

A histogram counts how many students fall in each trust “bucket.”

A density curve smooths those buckets into a single curve so we see the overall shape.

The area under the entire curve = 1 (i.e., 100% of people).

The probability that a randomly chosen person’s trust score is between certain values (e.g. 5 and 7)

Complex Geoms

geom_density: What is a Density Curve?

A histogram counts how many students fall in each trust “bucket.”

A density curve smooths those buckets into a single curve so we see the overall shape.

The area under the entire curve = 1 (i.e., 100% of people).

The probability that a randomly chosen person’s trust score is above 5.

Complex Geoms

geom_density: What is a Density Curve?

A histogram counts how many students fall in each trust “bucket.”

A density curve smooths those buckets into a single curve so we see the overall shape.

The area under the entire curve = 1 (i.e., 100% of people).

The probability that a randomly chosen person’s trust score is below 4.

Complex Geoms

geom_density: What is a Density Curve?

Heights tell you nothing by themselves—areas tell you the share.

When I widen the range, the area grows—so the probability grows.

This curve is smoothed from the histogram; bandwidth controls how smooth.

Complex Geoms

geom_density: What is a Density Curve?

Heights tell you nothing by themselves—areas tell you the share.

When I widen the range, the area grows—so the probability grows.

This curve is smoothed from the histogram; bandwidth controls how smooth.

Here too there is a connection between density, histograms, and boxplots.

Complex Geoms

geom_density: Example

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.

Show the code
# Match density to histogram counts
N <- nrow(eg5)
binwidth <- diff(range(eg5$trust)) / 10  # because bins = 10

# Draw histogram
ggplot(eg5, aes(x = trust)) +
  geom_density(aes(y = after_stat(density) * N * binwidth), linewidth = 1) +
  labs(
    x = "trust",
    y = "count")

Complex Geoms

geom_density: Example

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.

It is more informative to include both geom_density and geom_histogram.

Show the code
# Match density to histogram counts
N <- nrow(eg5)
binwidth <- diff(range(eg5$trust)) / 10  # because bins = 10

# Draw histogram
ggplot(eg5, aes(x = trust)) +
  geom_histogram(bins = 10, color = "white") +
  geom_density(aes(y = after_stat(density) * N * binwidth), linewidth = 1) +
  labs(
    x = "trust",
    y = "count")

Complex Geoms

geom_density: Example

Most people in the survey gave middle-of-the-road trust scores—think around 5 to 7.

As you move toward the extremes (very low 1–3 or very high 8–10), the curve drops, showing fewer people chose those scores.

Show the code
# Match density to histogram counts
N <- nrow(eg5)
binwidth <- diff(range(eg5$trust)) / 10  # because bins = 10

# Draw histogram
ggplot(eg5, aes(x = trust)) +
  geom_histogram(bins = 10, color = "white") +
  geom_density(aes(y = after_stat(density) * N * binwidth), linewidth = 1) +
  labs(
    x = "trust",
    y = "count")

Complex Geoms

geom_violin: Example

Violin plots are symmetric representations of geom_density

Complex Geoms

geom_violin: Example

Violin plots are symmetric representations of geom_density

This was our density

Complex Geoms

geom_violin: Example

Violin plots are symmetric representations of geom_density

This was our density shaded in white

Complex Geoms

geom_violin: Example

Violin plots are symmetric representations of geom_density

This is our density flipped

Complex Geoms

geom_violin: Example

Violin plots are symmetric representations of geom_density

This is what happens if we create a mirror density

Complex Geoms

geom_violin: Example

Violin plots are symmetric representations of geom_density

This is what geom_violin looks like.

ggplot(eg5, aes(x = "", y = trust)) +
  geom_violin()

Complex Geoms

geom_violin: Example

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

Show the code
ggplot(eg5, aes(x = gender, y = trust)) +
  geom_violin()

Complex Geoms

geom_violin: Example

Interpretation: The data suggests that men’s responses are more spread out (from low to high trust), while women’s are more concentrated in the middle-to-high range.

Show the code
ggplot(eg5, aes(x = gender, y = trust)) +
  geom_violin()

Complex Geoms

geom_smooth

This is what we had previously:

# Example data
eg1 <- data.frame(
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),   # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27),  # GDP per capita in $1000s
  country = c(
    "North Korea",   # very autocratic, very poor
    "Saudi Arabia",  # autocratic, but richer due to oil
    "Zimbabwe",      # authoritarian, low GDP
    "Russia",        # hybrid regime, middle income
    "Nigeria",       # similar position
    "India",         # low–mid democracy, growing GDP
    "Brazil",        # democracy, mid GDP
    "Poland",        # consolidated democracy, higher GDP
    "South Korea"   # rich democracy
  ))

Complex Geoms

geom_smooth

This is what we had previously:

democracy gdp country
-8 2 North Korea
-7 9 Saudi Arabia
-5 4 Zimbabwe
-3 7 Russia
0 8 Nigeria
2 20 India
5 15 Brazil
8 25 Poland
9 27 South Korea

Complex Geoms

geom_smooth

This is what we had previously:

# Scatterplot with geom_point
ggplot(eg1, aes(x = democracy, y = gdp)) +
  geom_point()

Complex Geoms

geom_smooth

This is what happens when we try to fit a line:

# Scatterplot with geom_point
ggplot(eg1, aes(x = democracy, y = gdp)) +
  geom_smooth(method = lm, se = FALSE, color="black")

Complex Geoms

geom_smooth

Interpretation: As one moves right (more democratic), the dots usually sit higher—meaning those countries tend to be richer. The black line going up summarizes that big-picture pattern, even though a few dots don’t fit.

# Scatterplot with geom_point
ggplot(eg1, aes(x = democracy, y = gdp)) +
  geom_smooth(method = lm, se = FALSE, color="black")

Complex Geoms

geom_errorbar

Error bars are like “antennae” on top of a mean.

They don’t show the full spread like a boxplot, but they give us a quick picture of how precise the average is.

  • Short bars → data points are close together.
  • Long bars → data points are spread out.

This is useful when we only want to compare averages.

Complex Geoms

geom_errorbar

A natural question is how do error bars compare to boxplots.

Boxplot

  • Shows the distribution of the data: Median, quartiles, and possible outliers.
  • Good for: seeing the shape and spread

Error Bar

  • Summarizes just the mean + uncertainty (often mean ± standard error or confidence interval).
  • Doesn’t show the full shape of the data.

Complex Geoms

geom_errorbar

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

# Set a random seed so results are reproducible.
# (Without this, R will generate different random numbers each time.)
set.seed(123)

# -------------------------------------------------------------
# Step 1: Create a toy dataset: "trust in government" by gender
# --------------------------------------------------------------
eg5 <- data.frame(
  # Repeat the labels "Men" and "Women" 20 times each
  gender = rep(c("Men", "Women"), each = 20),
  
  # Generate 20 trust values for Men ~ Normal(mean=5, sd=2)
  # Generate 20 trust values for Women ~ Normal(mean=8, sd=1)
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)
  )
)
#Print the first 3 entries
head(eg5, n=3)
gender trust
Men 3.879049
Men 4.539645
Men 8.117417

Complex Geoms

geom_errorbar

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

# ----------------------------------------------------------------------
# Step 2: Calculate mean trust and error bars (95% confidence intervals)
# ----------------------------------------------------------------------
df_err <- eg5 %>%
  group_by(gender) %>%       # Group data by gender
  summarise(
    mean = mean(trust),      # Average trust per gender
    se = sd(trust) / sqrt(n()), # Standard error = sd / sqrt(sample size)
  ) %>%
  mutate(
    # 95% confidence interval = mean ± 1.96 * standard error
    ymin = mean - 1.96 * se,   # Lower bound
    ymax = mean + 1.96 * se    # Upper bound
  )
head(df_err, n=3)
gender mean se ymin ymax
Men 5.283248 0.4349892 4.430669 6.135826
Women 7.948743 0.1855799 7.585006 8.312480

Complex Geoms

geom_errorbar

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

# --------------------------------------------------------
# Step 3: Plotting the means with error bars using ggplot2
# --------------------------------------------------------
ggplot(df_err, aes(x = gender, y = mean)) +
  geom_point(size = 4) +  # Plot mean values as large points
  geom_errorbar(
    aes(ymin = ymin, ymax = ymax), # Add vertical error bars
    width = 0.15                   # Small horizontal "cap" on error bars
  ) +
  labs(
    x = NULL,                       # Remove x-axis label
    y = "trust",      # Label y-axis
    title = "Error Bars"            # Add plot title
  ) +
  coord_cartesian(ylim = c(0, 10))  # Set y-axis limits from 0 to 10

Complex Geoms

geom_errorbar

Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.

Show the code
set.seed(123)

#Step1: Toy dataset: trust in government by gender
eg5 <- data.frame(
  gender = rep(c("Men", "Women"), each = 20),
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)))

#Step2: Calculating Error bars
df_err <- eg5 %>%
  group_by(gender) %>%
  summarise(
    mean = mean(trust),
    se = sd(trust) / sqrt(n()),
  ) %>%
  mutate(
    ymin = mean - 1.96 * se,   # 95% CI lower bound
    ymax = mean + 1.96 * se    # 95% CI upper bound
  )

#Step3: Plotting
ggplot(df_err, aes(x = gender, y = mean)) +
  geom_point(size = 4) +
  geom_errorbar(aes(ymin = ymin, ymax = ymax), width = 0.15) +
  labs(
    x = NULL,
    y = "trust",
    title = "Error Bars"
  ) +
  coord_cartesian(ylim = c(0, 10))   # same y-axis

Complex Geoms

geom_errorbar

Interpretation: Women in this sample show higher average trust in government than men. The bars show our uncertainty, and since they don’t overlap much, it means the difference is likely real and not just due to chance.

Show the code
set.seed(123)

#Step1: Toy dataset: trust in government by gender
eg5 <- data.frame(
  gender = rep(c("Men", "Women"), each = 20),
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)))

#Step2: Calculating Error bars
df_err <- eg5 %>%
  group_by(gender) %>%
  summarise(
    mean = mean(trust),
    se = sd(trust) / sqrt(n()),
  ) %>%
  mutate(
    ymin = mean - 1.96 * se,   # 95% CI lower bound
    ymax = mean + 1.96 * se    # 95% CI upper bound
  )

#Step3: Plotting
ggplot(df_err, aes(x = gender, y = mean)) +
  geom_point(size = 4) +
  geom_errorbar(aes(ymin = ymin, ymax = ymax), width = 0.15) +
  labs(
    x = NULL,
    y = "trust",
    title = "Error Bars"
  ) +
  coord_cartesian(ylim = c(0, 10))   # same y-axis

Complex Geoms

geom_sf

geom_sf draws maps from sf objects (“simple features” = data + geometry).

It works like other geoms: points and lines

Complex Geoms

geom_sf

geom_sf draws maps from sf objects (“simple features” = data + geometry).

With geom_sf, you can map points (cities, places, settlements)

Complex Geoms

geom_sf

geom_sf draws maps from sf objects (“simple features” = data + geometry).

With geom_sf, you can map lines (rivers, roads, straight lines)

Complex Geoms

geom_sf

geom_sf draws maps from sf objects (“simple features” = data + geometry).

With geom_sf, you can map polygons (countries, parks, seas, etc.)

Complex Geoms

geom_sf

geom_sf draws maps from sf objects (“simple features” = data + geometry).

It works like other geoms: points and lines

Typical workflow:

  • Get a shape (countries, regions, cities, rivers, roads, etc.)
  • Plot them with geom_sf

Complex Geoms

geom_sf Example

# Install once (uncomment if needed):
# install.packages(c("sf", "ggplot2", "rnaturalearth", "rnaturalearthdata"))
library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)

# Step1: Get a country polygons dataset as an sf object
world <- rnaturalearth::ne_countries(scale = "medium",
                                     returnclass = "sf")
head(world, n=3)

featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level type tlc admin adm0_a3 geou_dif geounit gu_a3 su_dif subunit su_a3 brk_diff name name_long brk_a3 brk_name brk_group abbrev postal formal_en formal_fr name_ciawf note_adm0 note_brk name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13 pop_est pop_rank pop_year gdp_md gdp_year economy income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh iso_n3 iso_n3_eh un_a3 wb_a2 wb_a3 woe_id woe_id_eh woe_note adm0_iso adm0_diff adm0_tlc adm0_a3_us adm0_a3_fr adm0_a3_ru adm0_a3_es adm0_a3_cn adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk adm0_a3_de adm0_a3_gb adm0_a3_br adm0_a3_il adm0_a3_ps adm0_a3_sa adm0_a3_eg adm0_a3_ma adm0_a3_pt adm0_a3_ar adm0_a3_jp adm0_a3_ko adm0_a3_vn adm0_a3_tr adm0_a3_id adm0_a3_pl adm0_a3_gr adm0_a3_it adm0_a3_nl adm0_a3_se adm0_a3_bd adm0_a3_ua adm0_a3_un adm0_a3_wb continent region_un subregion region_wb name_len long_len abbrev_len tiny homepart min_zoom min_label max_label label_x label_y ne_id wikidataid name_ar name_bn name_de name_en name_es name_fa name_fr name_el name_he name_hi name_hu name_id name_it name_ja name_ko name_nl name_pl name_pt name_ru name_sv name_tr name_uk name_ur name_vi name_zh name_zht fclass_iso tlc_diff fclass_tlc fclass_us fclass_fr fclass_ru fclass_es fclass_cn fclass_tw fclass_in fclass_np fclass_pk fclass_de fclass_gb fclass_br fclass_il fclass_ps fclass_sa fclass_eg fclass_ma fclass_pt fclass_ar fclass_jp fclass_ko fclass_vn fclass_tr fclass_id fclass_pl fclass_gr fclass_it fclass_nl fclass_se fclass_bd fclass_ua geometry
Admin-0 country 1 3 Zimbabwe ZWE 0 2 Sovereign country 1 Zimbabwe ZWE 0 Zimbabwe ZWE 0 Zimbabwe ZWE 0 Zimbabwe Zimbabwe ZWE Zimbabwe NA Zimb. ZW Republic of Zimbabwe NA Zimbabwe NA NA Zimbabwe NA 1 5 3 9 14645468 14 2019 21440 2019 5. Emerging region: G20 5. Low income ZI ZW ZW ZWE ZWE 716 716 716 ZW ZWE 23425004 23425004 Exact WOE match as country ZWE NA ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE ZWE -99 -99 Africa Africa Eastern Africa Sub-Saharan Africa 8 8 5 -99 1 0 2.5 8 29.92544 -18.91164 1159321441 Q954 زيمبابوي জিম্বাবুয়ে Simbabwe Zimbabwe Zimbabue زیمبابوه Zimbabwe Ζιμπάμπουε זימבבואה ज़िम्बाब्वे Zimbabwe Zimbabwe Zimbabwe ジンバブエ 짐바브웨 Zimbabwe Zimbabwe Zimbábue Зимбабве Zimbabwe Zimbabve Зімбабве زمبابوے Zimbabwe 津巴布韦 辛巴威 Admin-0 country NA Admin-0 country NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA MULTIPOLYGON (((31.28789 -2...
Admin-0 country 1 3 Zambia ZMB 0 2 Sovereign country 1 Zambia ZMB 0 Zambia ZMB 0 Zambia ZMB 0 Zambia Zambia ZMB Zambia NA Zambia ZM Republic of Zambia NA Zambia NA NA Zambia NA 5 8 5 13 17861030 14 2019 23309 2019 7. Least developed region 4. Lower middle income ZA ZM ZM ZMB ZMB 894 894 894 ZM ZMB 23425003 23425003 Exact WOE match as country ZMB NA ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB ZMB -99 -99 Africa Africa Eastern Africa Sub-Saharan Africa 6 6 6 -99 1 0 3.0 8 26.39530 -14.66080 1159321439 Q953 زامبيا জাম্বিয়া Sambia Zambia Zambia زامبیا Zambie Ζάμπια זמביה ज़ाम्बिया Zambia Zambia Zambia ザンビア 잠비아 Zambia Zambia Zâmbia Замбия Zambia Zambiya Замбія زیمبیا Zambia 赞比亚 尚比亞 Admin-0 country NA Admin-0 country NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA MULTIPOLYGON (((30.39609 -1...
Admin-0 country 1 3 Yemen YEM 0 2 Sovereign country 1 Yemen YEM 0 Yemen YEM 0 Yemen YEM 0 Yemen Yemen YEM Yemen NA Yem. YE Republic of Yemen NA Yemen NA NA Yemen, Rep. NA 5 3 3 11 29161922 15 2019 22581 2019 7. Least developed region 4. Lower middle income YM YE YE YEM YEM 887 887 887 RY YEM 23425002 23425002 Exact WOE match as country YEM NA YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM YEM -99 -99 Asia Asia Western Asia Middle East & North Africa 5 5 4 -99 1 0 3.0 8 45.87438 15.32823 1159321425 Q805 اليمن ইয়েমেন Jemen Yemen Yemen یمن Yémen Υεμένη תימן यमन Jemen Yaman Yemen イエメン 예멘 Jemen Jemen Iémen Йемен Jemen Yemen Ємен یمن Yemen 也门 葉門 Admin-0 country NA Admin-0 country NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA MULTIPOLYGON (((53.08564 16...

Complex Geoms

geom_sf Example

# Install once (uncomment if needed):
# install.packages(c("sf", "ggplot2", "rnaturalearth", "rnaturalearthdata"))
library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)

# Step1: Get a country polygons dataset as an sf object
world <- rnaturalearth::ne_countries(scale = "medium",
                                     returnclass = "sf")
head(world, n=3)

So, the world dataframe contains 169 variables (e.g. featurecla, scalerank, labelrank, etc.) and 242 observations

Complex Geoms

geom_sf Example

# Step2: Draw it
ggplot() +
  geom_sf(data=world) +
  labs(title = "The World (sf polygons)"

Complex Geoms

geom_sf Example

Use coord_sf(xlim=..., ylim=...) to zoom

You can layer multiple sf objects:

  • Countries (polygons)
  • Cities (points)

Complex Geoms

geom_sf Example

europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))

ggplot() +
  geom_sf(data = world) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")

Complex Geoms

geom_sf Example

europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))

ggplot() +
  geom_sf(data = world) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")

Complex Geoms

geom_sf Example

europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))

# Create a few points
cities <- data.frame(
  name = c("Rome", "Berlin", "Paris"),
  lon = c(12.5, 13.4, 2.35),
  lat = c(41.9, 52.5, 48.85)
)

# Convert to sf POINTs
cities_sf <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4326)

# Mapping Them
ggplot() +
  geom_sf(data = world) +
  geom_sf(data = cities_sf) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")

Complex Geoms

geom_sf Example

europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))

# Create a few points
cities <- data.frame(
  name = c("Rome", "Berlin", "Paris"),
  lon = c(12.5, 13.4, 2.35),
  lat = c(41.9, 52.5, 48.85)
)

# Convert to sf POINTs
cities_sf <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4326)

# Create a line connecting Rome -> Berlin -> Paris
line_sf <- st_sfc(st_linestring(as.matrix(cities[, c("lon", "lat")])), crs = 4326)

# Mapping Them
ggplot() +
  geom_sf(data = world) +
  geom_sf(data = cities_sf) +
  geom_sf(data = line_sf) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")

Conclusion

Key Takeaways

  • Grammar of Graphics: Data + Aesthetics + Geoms = Visualization.
  • Simple geoms (points, bars, lines, text) help explore relationships and categories.
  • Complex geoms (histograms, density, boxplots, violins, maps) reveal deeper patterns.
  • Always tidy your data before plotting.
  • Good visualization = clarity + accuracy

Exercises

Exercises

Points vs. Lines

Use the dataset below:

df <- data.frame(
  year = c(2000, 2004, 2008, 2012, 2016, 2020),
  turnout = c(55, 58, 62, 60, 59, 65))
  1. Make a scatterplot of turnout by year with geom_point().
  2. Now use geom_line().
  3. Which geom makes more sense for these data? Why?

Exercises

Bar vs. Column

Suppose you have survey data on respondents’ education:

df <- data.frame(
  education = c("Primary", "Primary", "High School", "High School", "College", "College", "College")
)
  1. Plot the distribution of education levels with geom_bar()
  2. Plot the distribution of education levels with geom_col()
  3. What is the difference between the two approaches?

Exercises

Boxplot vs. Histogram

We surveyed trust in government on a 1–10 scale:

set.seed(123)
df <- data.frame(
  trust = c(rnorm(30, mean = 5, sd = 2), rnorm(30, mean = 7, sd = 1))
)
  1. Plot the distribution with a histogram.
  2. Plot the same distribution with a boxplot.
  3. What information is easier to see in the histogram? In the boxplot?

Exercises Answers

Exercises

Points vs. Lines

Use the dataset below:

df <- data.frame(
  year = c(2000, 2004, 2008, 2012, 2016, 2020),
  turnout = c(55, 58, 62, 60, 59, 65))
  1. Make a scatterplot of turnout by year with geom_point().
Show the code
library(ggplot2)
ggplot(df, aes(x = year, y=turnout)) +
  geom_point()

Exercises

Points vs. Lines

Use the dataset below:

df <- data.frame(
  year = c(2000, 2004, 2008, 2012, 2016, 2020),
  turnout = c(55, 58, 62, 60, 59, 65))
  1. Now use geom_line().
Show the code
library(ggplot2)
ggplot(df, aes(x = year, y=turnout)) +
  geom_line()

Exercises

Points vs. Lines

  1. Which geom makes more sense for these data? Why?

The line plot is usually more informative because it shows the trend in turnout over time.

Exercises

Bar vs. Column

Suppose you have survey data on respondents’ education:

df <- data.frame(
  education = c("Primary", "Primary", "High School", "High School", "College", "College", "College")
)
  1. Plot the distribution of education levels with geom_bar()
Show the code
# Bar plot
ggplot(df, aes(x = education)) +
  geom_bar()

Exercises

Bar vs. Column

Suppose you have survey data on respondents’ education:

df <- data.frame(
  education = c("Primary", "Primary", "High School", "High School", "College", "College", "College")
)
  1. Pre-count the values with dplyr::count() and plot them with geom_col():
Show the code
# Step1: Pre-count the values
df_counts <- df %>%
  count(education)
print(df_counts)
education n
College 3
High School 2
Primary 2

Exercises

Bar vs. Column

Suppose you have survey data on respondents’ education:

df <- data.frame(
  education = c("Primary", "Primary", "High School", "High School", "College", "College", "College")
)
  1. Pre-count the values with dplyr::count() and plot them with geom_col():
Show the code
# Step1: Pre-count the values
df_counts <- df %>%
  count(education)

# Step2: Plot with geom_col()
ggplot(df_counts, aes(x = education, y = n)) +
  geom_col()

Exercises

Bar vs. Column

Suppose you have survey data on respondents’ education:

df <- data.frame(
  education = c("Primary", "Primary", "High School", "High School", "College", "College", "College")
)
  1. What is the difference between the two approaches?

The key distinction:

  • geom_bar() = ggplot does the counting for you.
  • geom_col() = you must provide the counts yourself.

Exercises

Boxplot vs. Histogram

We surveyed trust in government on a 1–10 scale:

set.seed(123)
df <- data.frame(
  trust = c(rnorm(30, mean = 5, sd = 2), rnorm(30, mean = 7, sd = 1))
)
  1. Plot the distribution with a histogram.
Show the code
# Histogram of trust
ggplot(df, aes(x = trust)) +
  geom_histogram(binwidth = 1, color = "white")

Exercises

Boxplot vs. Histogram

We surveyed trust in government on a 1–10 scale:

set.seed(123)
df <- data.frame(
  trust = c(rnorm(30, mean = 5, sd = 2), rnorm(30, mean = 7, sd = 1))
)
  1. Plot the same distribution with a boxplot.
Show the code
# Boxplot of trust
ggplot(df, aes(y = trust)) +
  geom_boxplot()

Exercises

Boxplot vs. Histogram

  1. What information is easier to see in the histogram? In the boxplot?

  • Easier to see the shape of the distribution (is it unimodal, bimodal, skewed?).
  • You can spot clusters (e.g., many respondents around 5 and many around 7).
  • You get a sense of frequency across different ranges.

  • Easier to see the median and spread at a glance.
  • Shows quartiles (middle 50% of the data).
  • Highlights outliers explicitly.