Color, Size, Shape, Transparency, and Linetype
By the end of this lecture, you will be able to:
Aesthetics bring plots to life: color, size, shape, transparency, linetype
aes() maps variables to these aesthetics, linking data to visual features.Aesthetics highlight comparisons and guide attention
These are the simple geoms that we covered previously.
geom_pointgeom_colgeom_bargeom_textgeom_lineThis is what they are used for:
geom_pointgeom_colgeom_bargeom_textgeom_line| geom | Use for | 
|---|---|
geom_point() | 
Relationships between two variables; at least 10 obs. | 
geom_col() | 
Totals/percent per category; ordered bars. Uses pre-computed values for bar height. You must supply both x and y | 
geom_bar() | 
Counts the observations in each category. You must supply x, not y | 
geom_text() | 
Direct labels for small N; annotate outliers | 
geom_line() | 
Relationships between two variables; at least 10 obs. | 
Let’s explore how aesthetics change meaning and clarity.
color(discrete)color(continuous)size(discrete + continuous)fill(discrete + continuous)shape(discrete)alpha(discrete + continuous)This is what they are used for:
color(discrete)color(continuous)size(discrete + continuous)fill(discrete + continuous)shape(discrete)alpha(discrete + continuous)| Aesthetic | Use for | 
|---|---|
color (discrete) | 
Differentiate categories with distinct hues | 
color (continuous) | 
Show gradual change or intensity across a numeric scale | 
size (discrete + continuous) | 
Represent magnitude, frequency, or importance; best with moderate differences | 
fill (discrete + continuous) | 
Similar to color but applies to filled shapes (bars, areas, points with pch=21–25) | 
shape (discrete) | 
Distinguish categories when colors alone aren’t enough; limited shapes available | 
alpha (discrete + continuous) | 
Control transparency to show overlap/density; reduce clutter in crowded plots | 
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
| democracy | gdp | 
|---|---|
| -8 | 2 | 
| -7 | 9 | 
| -5 | 4 | 
| -3 | 7 | 
| 0 | 8 | 
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
# Reusable dataset
eg1 <- data.frame(
  country = c(
    "North Korea",   # very autocratic, very poor
    "Saudi Arabia",  # autocratic, but richer due to oil
    "Zimbabwe",      # authoritarian, low GDP
    "Russia",        # hybrid regime, middle income
    "Nigeria",       # similar position
    "India",         # low–mid democracy, growing GDP
    "Brazil",        # democracy, mid GDP
    "Poland",        # consolidated democracy, higher GDP
    "South Korea"   # rich democracy
  ), 
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),          # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27),          # GDP per capita ($1,000s)
  region = c("Asia", "Asia", "Africa",
             "Europe", "Africa", "Asia",
             "Americas", "Europe", "Asia"),        # categorical
  population = c(5, 50, 30, 12, 80, 60, 40, 100, 70), # continuous (millions)
  income_group = factor(
    c("Low", "Low", "Low",
      "Middle", "Middle", "Middle",
      "High", "High", "High"),
    levels = c("Low", "Middle", "High")),
  corruption = c(80, 65, 50, 40, 35, 30, 25, 20, 15))Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
| country | democracy | gdp | region | population | income_group | corruption | 
|---|---|---|---|---|---|---|
| North Korea | -8 | 2 | Asia | 5 | Low | 80 | 
| Saudi Arabia | -7 | 9 | Asia | 50 | Low | 65 | 
| Zimbabwe | -5 | 4 | Africa | 30 | Low | 50 | 
| Russia | -3 | 7 | Europe | 12 | Middle | 40 | 
| Nigeria | 0 | 8 | Africa | 80 | Middle | 35 | 
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the region (discrete):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the corruption levels (continuous):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the income groups (discrete):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize population size (continuous)
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize income groups (discrete).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the level of corruption (continuous).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
aes()aes()This is how we create the data:
aes()aes()This is what it looks like:
| x | y | group | 
|---|---|---|
| 1 | 2 | A | 
| 2 | 4 | B | 
| 3 | 6 | A | 
| 4 | 8 | B | 
| 5 | 10 | A | 
aes()aes()This is the difference:
Why it’s wrong:
"red" treated as a category, so ggplot makes a useless legend."red" is just a style — no legend clutter.color(discrete)color(continuous)fill(discrete)fill + color (black outline)fill(continuous)alpha(discrete + continuous)color(discrete)color(continuous)fill(discrete)fill + color (black outline)fill(continuous)alpha(discrete + continuous)| Aesthetic | Use for | 
|---|---|
color (discrete) | 
Differentiate categories with distinct outline hues (rare for bars) | 
color (continuous) | 
Show gradual change or intensity in outline scale (rare for bars) | 
fill (discrete) | 
Fill bars by category with distinct hues (common) | 
fill + color (outline) | 
Combine interior fill with a contrasting border for clarity | 
fill (continuous) | 
Show gradual change or intensity across a numeric fill scale (rare for bars) | 
alpha (discrete + continuous) | 
Control transparency to show emphasis or reduce clutter (rare for bars) | 
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
| education | 
|---|
| Primary | 
| Primary | 
| High School | 
| High School | 
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
| education | n | 
|---|---|
| College | 4 | 
| High School | 3 | 
| Primary | 2 | 
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
color(discrete)color(continuous)size - deprecatedlinewidth(discrete + continuous)linetype(discrete)alpha(discrete+continuous)color(discrete)color(continuous)size - deprecatedlinewidth(discrete + continuous)linetype(discrete)alpha(discrete+continuous)| Aesthetic | Use for | 
|---|---|
color (discrete) | 
Distinguish groups with different line colors | 
color (continuous) | 
Show gradient along x or y values; possible but unusual (rare for lines) | 
size | 
Deprecated; replaced by linewidth | 
linewidth (discrete + continuous) | 
Vary line thickness to emphasize magnitude (sometimes used) or weight (rare for lines) | 
linetype (discrete) | 
Differentiate groups with solid/dashed/dotted styles | 
alpha (discrete + continuous) | 
Control transparency to reduce clutter when many lines overlap (rare for lines) | 
Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
| year | turnout | 
|---|---|
| 2000 | 55 | 
| 2004 | 58 | 
| 2008 | 62 | 
| 2012 | 60 | 
| 2016 | 59 | 
| 2020 | 65 | 
Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
# Toy dataset with US and UK
eg5 <- data.frame(
  year = rep(c(2000, 2004, 2008, 2012, 2016, 2020), times = 2),
  turnout = c(
    # US presidential elections
    54, 60, 62, 58, 56, 65,
    # UK general elections (closest years aligned to US election years for teaching)
    59, 61, 65, 66, 68, 67
  ),
  country = rep(c("United States", "United Kingdom"), each = 6)
)Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
| year | turnout | country | 
|---|---|---|
| 2000 | 54 | United States | 
| 2004 | 60 | United States | 
| 2008 | 62 | United States | 
| 2012 | 58 | United States | 
| 2016 | 56 | United States | 
| 2020 | 65 | United States | 
| 2000 | 59 | United Kingdom | 
| 2004 | 61 | United Kingdom | 
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
| Geom | Useful aesthetics | Avoid | 
|---|---|---|
| Boxplot | fill, color | size, shape | 
| Histogram | fill, alpha | shape | 
| Density | color, linetype | size | 
| Violin | fill, color | shape | 
| Smooth | color, linetype | size | 
| SF (maps) | fill, color | shape | 
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
| gender | trust | 
|---|---|
| Men | 3.879049 | 
| Men | 4.539645 | 
| Men | 8.117417 | 
| Men | 5.141017 | 
| Men | 5.258576 | 
| Men | 8.430130 | 
| Men | 5.921832 | 
| Men | 2.469877 | 
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplot: fill(color)Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_histogram: fill + alphaSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_sf fill(color)library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", 
                      returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))
# Mapping it
ggplot() +
  geom_sf(data = world, aes(fill=log(pop_est))) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y)On the most fundamental level, we need to use the right colors for our visualizations
This is relevant for:
8% of men and 0.5% of women have some form of color blindness
Thus, colors should be distinguishable by people with different forms of color blindness
The Viridis palette in R allows us to create color-blind-friendly graphs
These are predefined palettes that are widely used.
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
# Reusable dataset
eg1 <- data.frame(
  country = c(
    "North Korea",   # very autocratic, very poor
    "Saudi Arabia",  # autocratic, but richer due to oil
    "Zimbabwe",      # authoritarian, low GDP
    "Russia",        # hybrid regime, middle income
    "Nigeria",       # similar position
    "India",         # low–mid democracy, growing GDP
    "Brazil",        # democracy, mid GDP
    "Poland",        # consolidated democracy, higher GDP
    "South Korea"   # rich democracy
  ), 
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),          # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27),          # GDP per capita ($1,000s)
  region = c("Asia", "Asia", "Africa",
             "Europe", "Africa", "Asia",
             "Americas", "Europe", "Asia"),        # categorical
  population = c(5, 50, 30, 12, 80, 60, 40, 100, 70), # continuous (millions)
  income_group = factor(
    c("Low", "Low", "Low",
      "Middle", "Middle", "Middle",
      "High", "High", "High"),
    levels = c("Low", "Middle", "High")),
  corruption = c(80, 65, 50, 40, 35, 30, 25, 20, 15))Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
| country | democracy | gdp | region | population | income_group | corruption | 
|---|---|---|---|---|---|---|
| North Korea | -8 | 2 | Asia | 5 | Low | 80 | 
| Saudi Arabia | -7 | 9 | Asia | 50 | Low | 65 | 
| Zimbabwe | -5 | 4 | Africa | 30 | Low | 50 | 
| Russia | -3 | 7 | Europe | 12 | Middle | 40 | 
| Nigeria | 0 | 8 | Africa | 80 | Middle | 35 | 
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
geom_sf fill(color) - No Viridislibrary(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", 
                      returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))
# Mapping it
ggplot() +
  geom_sf(data = world, aes(fill=log(pop_est))) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")geom_sf fill(color) - Viridislibrary(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", 
                      returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))
# Mapping it
ggplot() +
  geom_sf(data = world, aes(fill=log(pop_est))) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")+
  scale_fill_viridis_c(option = "viridis")  # you can also try "magma", "inferno", "cividis", etc.What do you see?
How does the story change?
What does adding size tell us?
# color (continuous): encode a numeric gradient (low → high)
# size (continuous): encode magnitude with point size
ggplot(eg1, aes(x = democracy, y = gdp, color=corruption, size=population)) +
  geom_point() +
  scale_color_viridis_c(option = "viridis")  # you can also try "magma", "inferno", "cividis", etc.What do you see?
democracy ↔︎ gdp
How does the story change?
Corruption and Institutions matter
What does adding size tell us?
# color (continuous): encode a numeric gradient (low → high)
# size (continuous): encode magnitude with point size
ggplot(eg1, aes(x = democracy, y = gdp, color=corruption, size=population)) +
  geom_point() +
  scale_color_viridis_c(option = "viridis")  # you can also try "magma", "inferno", "cividis", etc.Population weights the story: more populous countries also have higher GDP and are more democratic.
Popescu (JCU): Data Visualization 2