Color, Size, Shape, Transparency, and Linetype
By the end of this lecture, you will be able to:
Aesthetics bring plots to life: color, size, shape, transparency, linetype
aes() maps variables to these aesthetics, linking data to visual features.Aesthetics highlight comparisons and guide attention
These are the simple geoms that we covered previously.
geom_pointgeom_colgeom_bargeom_textgeom_lineThis is what they are used for:
geom_pointgeom_colgeom_bargeom_textgeom_line| geom | Use for |
|---|---|
geom_point() |
Relationships between two variables; at least 10 obs. |
geom_col() |
Totals/percent per category; ordered bars. Uses pre-computed values for bar height. You must supply both x and y |
geom_bar() |
Counts the observations in each category. You must supply x, not y |
geom_text() |
Direct labels for small N; annotate outliers |
geom_line() |
Relationships between two variables; at least 10 obs. |
Let’s explore how aesthetics change meaning and clarity.
color(discrete)color(continuous)size(discrete + continuous)fill(discrete + continuous)shape(discrete)alpha(discrete + continuous)This is what they are used for:
color(discrete)color(continuous)size(discrete + continuous)fill(discrete + continuous)shape(discrete)alpha(discrete + continuous)| Aesthetic | Use for |
|---|---|
color (discrete) |
Differentiate categories with distinct hues |
color (continuous) |
Show gradual change or intensity across a numeric scale |
size (discrete + continuous) |
Represent magnitude, frequency, or importance; best with moderate differences |
fill (discrete + continuous) |
Similar to color but applies to filled shapes (bars, areas, points with pch=21–25) |
shape (discrete) |
Distinguish categories when colors alone aren’t enough; limited shapes available |
alpha (discrete + continuous) |
Control transparency to show overlap/density; reduce clutter in crowded plots |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
| democracy | gdp |
|---|---|
| -8 | 2 |
| -7 | 9 |
| -5 | 4 |
| -3 | 7 |
| 0 | 8 |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
# Reusable dataset
eg1 <- data.frame(
country = c(
"North Korea", # very autocratic, very poor
"Saudi Arabia", # autocratic, but richer due to oil
"Zimbabwe", # authoritarian, low GDP
"Russia", # hybrid regime, middle income
"Nigeria", # similar position
"India", # low–mid democracy, growing GDP
"Brazil", # democracy, mid GDP
"Poland", # consolidated democracy, higher GDP
"South Korea" # rich democracy
),
democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9), # democracy score
gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27), # GDP per capita ($1,000s)
region = c("Asia", "Asia", "Africa",
"Europe", "Africa", "Asia",
"Americas", "Europe", "Asia"), # categorical
population = c(5, 50, 30, 12, 80, 60, 40, 100, 70), # continuous (millions)
income_group = factor(
c("Low", "Low", "Low",
"Middle", "Middle", "Middle",
"High", "High", "High"),
levels = c("Low", "Middle", "High")),
corruption = c(80, 65, 50, 40, 35, 30, 25, 20, 15))Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
| country | democracy | gdp | region | population | income_group | corruption |
|---|---|---|---|---|---|---|
| North Korea | -8 | 2 | Asia | 5 | Low | 80 |
| Saudi Arabia | -7 | 9 | Asia | 50 | Low | 65 |
| Zimbabwe | -5 | 4 | Africa | 30 | Low | 50 |
| Russia | -3 | 7 | Europe | 12 | Middle | 40 |
| Nigeria | 0 | 8 | Africa | 80 | Middle | 35 |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the region (discrete):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the corruption levels (continuous):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the income groups (discrete):
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize population size (continuous)
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize income groups (discrete).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
This is how we can emphasize the level of corruption (continuous).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
aes()aes()This is how we create the data:
aes()aes()This is what it looks like:
| x | y | group |
|---|---|---|
| 1 | 2 | A |
| 2 | 4 | B |
| 3 | 6 | A |
| 4 | 8 | B |
| 5 | 10 | A |
aes()aes()This is the difference:
Why it’s wrong:
"red" treated as a category, so ggplot makes a useless legend."red" is just a style — no legend clutter.color(discrete)color(continuous)fill(discrete)fill + color (black outline)fill(continuous)alpha(discrete + continuous)color(discrete)color(continuous)fill(discrete)fill + color (black outline)fill(continuous)alpha(discrete + continuous)| Aesthetic | Use for |
|---|---|
color (discrete) |
Differentiate categories with distinct outline hues (rare for bars) |
color (continuous) |
Show gradual change or intensity in outline scale (rare for bars) |
fill (discrete) |
Fill bars by category with distinct hues (common) |
fill + color (outline) |
Combine interior fill with a contrasting border for clarity |
fill (continuous) |
Show gradual change or intensity across a numeric fill scale (rare for bars) |
alpha (discrete + continuous) |
Control transparency to show emphasis or reduce clutter (rare for bars) |
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
| education |
|---|
| Primary |
| Primary |
| High School |
| High School |
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
| education | n |
|---|---|
| College | 4 |
| High School | 3 |
| Primary | 2 |
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
Suppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
color(discrete)color(continuous)size - deprecatedlinewidth(discrete + continuous)linetype(discrete)alpha(discrete+continuous)color(discrete)color(continuous)size - deprecatedlinewidth(discrete + continuous)linetype(discrete)alpha(discrete+continuous)| Aesthetic | Use for |
|---|---|
color (discrete) |
Distinguish groups with different line colors |
color (continuous) |
Show gradient along x or y values; possible but unusual (rare for lines) |
size |
Deprecated; replaced by linewidth |
linewidth (discrete + continuous) |
Vary line thickness to emphasize magnitude (sometimes used) or weight (rare for lines) |
linetype (discrete) |
Differentiate groups with solid/dashed/dotted styles |
alpha (discrete + continuous) |
Control transparency to reduce clutter when many lines overlap (rare for lines) |
Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
| year | turnout |
|---|---|
| 2000 | 55 |
| 2004 | 58 |
| 2008 | 62 |
| 2012 | 60 |
| 2016 | 59 |
| 2020 | 65 |
Suppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
# Toy dataset with US and UK
eg5 <- data.frame(
year = rep(c(2000, 2004, 2008, 2012, 2016, 2020), times = 2),
turnout = c(
# US presidential elections
54, 60, 62, 58, 56, 65,
# UK general elections (closest years aligned to US election years for teaching)
59, 61, 65, 66, 68, 67
),
country = rep(c("United States", "United Kingdom"), each = 6)
)Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
| year | turnout | country |
|---|---|---|
| 2000 | 54 | United States |
| 2004 | 60 | United States |
| 2008 | 62 | United States |
| 2012 | 58 | United States |
| 2016 | 56 | United States |
| 2020 | 65 | United States |
| 2000 | 59 | United Kingdom |
| 2004 | 61 | United Kingdom |
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
Suppose we have data on average voter turnout (%) in national elections over several years for the US and the UK. We want to see the trend in participation.
| Geom | Useful aesthetics | Avoid |
|---|---|---|
| Boxplot | fill, color | size, shape |
| Histogram | fill, alpha | shape |
| Density | color, linetype | size |
| Violin | fill, color | shape |
| Smooth | color, linetype | size |
| SF (maps) | fill, color | shape |
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
| gender | trust |
|---|---|
| Men | 3.879049 |
| Men | 4.539645 |
| Men | 8.117417 |
| Men | 5.141017 |
| Men | 5.258576 |
| Men | 8.430130 |
| Men | 5.921832 |
| Men | 2.469877 |
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplot: fill(color)Suppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_histogram: fill + alphaSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_sf fill(color)library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium",
returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
y = c(35, 70))
# Mapping it
ggplot() +
geom_sf(data = world, aes(fill=log(pop_est))) +
coord_sf(xlim = europe_bounds$x,
ylim = europe_bounds$y)On the most fundamental level, we need to use the right colors for our visualizations
This is relevant for:
8% of men and 0.5% of women have some form of color blindness
Thus, colors should be distinguishable by people with different forms of color blindness
The Viridis palette in R allows us to create color-blind-friendly graphs
These are predefined palettes that are widely used.
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
# Reusable dataset
eg1 <- data.frame(
country = c(
"North Korea", # very autocratic, very poor
"Saudi Arabia", # autocratic, but richer due to oil
"Zimbabwe", # authoritarian, low GDP
"Russia", # hybrid regime, middle income
"Nigeria", # similar position
"India", # low–mid democracy, growing GDP
"Brazil", # democracy, mid GDP
"Poland", # consolidated democracy, higher GDP
"South Korea" # rich democracy
),
democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9), # democracy score
gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27), # GDP per capita ($1,000s)
region = c("Asia", "Asia", "Africa",
"Europe", "Africa", "Asia",
"Americas", "Europe", "Asia"), # categorical
population = c(5, 50, 30, 12, 80, 60, 40, 100, 70), # continuous (millions)
income_group = factor(
c("Low", "Low", "Low",
"Middle", "Middle", "Middle",
"High", "High", "High"),
levels = c("Low", "Middle", "High")),
corruption = c(80, 65, 50, 40, 35, 30, 25, 20, 15))Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Let’s imagine that we have a new dataset with more variables:
| country | democracy | gdp | region | population | income_group | corruption |
|---|---|---|---|---|---|---|
| North Korea | -8 | 2 | Asia | 5 | Low | 80 |
| Saudi Arabia | -7 | 9 | Asia | 50 | Low | 65 |
| Zimbabwe | -5 | 4 | Africa | 30 | Low | 50 |
| Russia | -3 | 7 | Europe | 12 | Middle | 40 |
| Nigeria | 0 | 8 | Africa | 80 | Middle | 35 |
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
Imagine we have data about 9 countries that record their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
geom_sf fill(color) - No Viridislibrary(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium",
returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
y = c(35, 70))
# Mapping it
ggplot() +
geom_sf(data = world, aes(fill=log(pop_est))) +
coord_sf(xlim = europe_bounds$x,
ylim = europe_bounds$y) +
labs(title = "European Countries")geom_sf fill(color) - Viridislibrary(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium",
returnclass = "sf")
europe_bounds <- list(x = c(-10, 40),
y = c(35, 70))
# Mapping it
ggplot() +
geom_sf(data = world, aes(fill=log(pop_est))) +
coord_sf(xlim = europe_bounds$x,
ylim = europe_bounds$y) +
labs(title = "European Countries")+
scale_fill_viridis_c(option = "viridis") # you can also try "magma", "inferno", "cividis", etc.What do you see?
How does the story change?
What does adding size tell us?
# color (continuous): encode a numeric gradient (low → high)
# size (continuous): encode magnitude with point size
ggplot(eg1, aes(x = democracy, y = gdp, color=corruption, size=population)) +
geom_point() +
scale_color_viridis_c(option = "viridis") # you can also try "magma", "inferno", "cividis", etc.What do you see?
democracy ↔︎ gdp
How does the story change?
Corruption and Institutions matter
What does adding size tell us?
# color (continuous): encode a numeric gradient (low → high)
# size (continuous): encode magnitude with point size
ggplot(eg1, aes(x = democracy, y = gdp, color=corruption, size=population)) +
geom_point() +
scale_color_viridis_c(option = "viridis") # you can also try "magma", "inferno", "cividis", etc.Population weights the story: more populous countries also have higher GDP and are more democratic.
Popescu (JCU): Data Visualization 2