| Country | LifeExpectancy | 
|---|---|
| Monaco | 77.25278 | 
| Andorra | 77.04861 | 
| San Marino | 76.88056 | 
| Guernsey | 76.71944 | 
| Israel | 75.49722 | 
From Points and Bars to Boxplots and Maps in ggplot2
By the end of this lecture, you will be able to:
Would you rather read this?
| Country | LifeExpectancy | 
|---|---|
| Monaco | 77.25278 | 
| Andorra | 77.04861 | 
| San Marino | 76.88056 | 
| Guernsey | 76.71944 | 
| Israel | 75.49722 | 
…or see this? 
Visuals are powerful because they:
R can help us achieve great visualizations with the help of ggplot2
R can help us achieve great visualizations with the help of ggplot2
R can help us achieve great visualizations with the help of ggplot2
At the most basic level we have:
Optional layers include:
ggplot() with +Before we use ggplot, our data has to be tidy. This means:
Geoms are the “shapes” your data takes on a graph.
Example:
geom_pointgeom_colgeom_bargeom_textgeom_linegeom_pointgeom_colgeom_bargeom_textgeom_line| geom | Use for | 
|---|---|
geom_point() | 
Relationships between two variables; at least 10 obs. | 
geom_col() | 
Totals/percent per category; ordered bars. Uses pre-computed values for bar height. You must supply both x and y | 
geom_bar() | 
Counts the observations in each category. You must supply x, not y | 
geom_text() | 
Direct labels for small N; annotate outliers | 
geom_line() | 
Relationships between two variables; at least 10 obs. | 
geom_pointImagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
geom_pointImagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
| democracy | gdp | 
|---|---|
| -8 | 2 | 
| -7 | 9 | 
| -5 | 4 | 
| -3 | 7 | 
| 0 | 8 | 
geom_pointImagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
geom_pointInterpretation: Each dot shows one country’s democracy score (left to right) and its GDP per person (up and down). The dots suggest that more democratic countries usually have higher GDP per person.
geom_colSuppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
| education | 
|---|
| Primary | 
| Primary | 
| High School | 
| High School | 
geom_colSuppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
| education | n | 
|---|---|
| College | 4 | 
| High School | 3 | 
| Primary | 2 | 
geom_colSuppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
geom_barSuppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
geom_col vs. geom_barThis is the difference:
geom_textSuppose we collected a small survey and just recorded the education level of each respondent. We want to visualize how many respondents fall into each category.
geom_textWe can also use geom_text with the first example:
Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
eg1 <- data.frame(
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),   # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27),  # GDP per capita in $1000s
  country = c(
    "North Korea",   # very autocratic, very poor
    "Saudi Arabia",  # autocratic, but richer due to oil
    "Zimbabwe",      # authoritarian, low GDP
    "Russia",        # hybrid regime, middle income
    "Nigeria",       # similar position
    "India",         # low–mid democracy, growing GDP
    "Brazil",        # democracy, mid GDP
    "Poland",        # consolidated democracy, higher GDP
    "South Korea"   # rich democracy
  ))geom_textWe can also use geom_text with the first example:
Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
| democracy | gdp | country | 
|---|---|---|
| -8 | 2 | North Korea | 
| -7 | 9 | Saudi Arabia | 
| -5 | 4 | Zimbabwe | 
| -3 | 7 | Russia | 
| 0 | 8 | Nigeria | 
geom_textWe can also use geom_text with the first example:
Imagine we have data about 9 countries that records their level of democracy from -10 to +10 (x-axis) and their GDP per capita in $1,000s (y-axis).
geom_lineSuppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
geom_lineSuppose we have data on average voter turnout (%) in national elections over several years. We want to see the trend in participation.
geom_boxplotgeom_histogramgeom_densitygeom_violingeom_smoothgeom_errorbargeom_sfgeom_boxplotgeom_histogramgeom_densitygeom_violingeom_smoothgeom_errorbargeom_sf| geom | Use for | 
|---|---|
geom_boxplot(). | 
Compare distributions across groups | 
geom_histogram() | 
Distribution of a single continuous variable (frequency bins) | 
geom_density() | 
Smoothed distribution of a single continuous variable | 
geom_violin() | 
Visualizing distributions across categories (shape of data, not just box) | 
geom_smooth() | 
Adding fitted trend/smoothed lines (LOESS, GAM, linear, etc.) | 
geom_errorbar() | 
Showing uncertainty or variability (e.g., CI ranges around a mean) | 
geom_sf() | 
Plotting spatial (map) data stored as sf objects | 
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to see the typical values and how spread out the answers are.
| gender | trust | 
|---|---|
| Men | 3.879049 | 
| Men | 4.539645 | 
| Men | 8.117417 | 
| Men | 5.141017 | 
| Men | 5.258576 | 
| Men | 8.430130 | 
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to see the typical values and how spread out the answers are.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to see the typical values and how spread out the answers are.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
| gender | trust | 
|---|---|
| Men | 3.879049 | 
| Men | 4.539645 | 
| Men | 8.117417 | 
| Men | 5.141017 | 
| Men | 5.258576 | 
| Men | 8.430130 | 
| Men | 5.921832 | 
| Men | 2.469877 | 
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_boxplotSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_histogram: What is a Histogram?Histograms allow us to better understand how frequently or infrequently certain values occur in our dataset.
Imagine a set of values that are spaced out along a number line.
geom_histogram: What is a Histogram?To construct a histogram, a section of the number line is divided into equal chunks, called bins.
Next, count how many data points sit inside each bin, and draw bars, one for each bin
The heights of the bars correspond to the number of data points.
geom_histogram: What is a Histogram?Label the data (in the example below each data point is an SAT score)
Draw in a y-axis which counts the number of data points in each bin
Finally label your bins.
geom_histogram: ExampleSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.
geom_histogram: ExampleSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.
There is a direct connection between histograms and boxplots:
geom_histogram: ExampleSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.
There is a direct connection between histograms and boxplots:
geom_histogram: ExampleOne specific aspect pertaining to histograms is the number of bins:
geom_density: What is a Density Curve?A histogram counts how many students fall in each trust “bucket.”
geom_density: What is a Density Curve?A histogram counts how many students fall in each trust “bucket.”
A density curve smooths those buckets into a single curve so we see the overall shape.
geom_density: What is a Density Curve?A histogram counts how many students fall in each trust “bucket.”
A density curve smooths those buckets into a single curve so we see the overall shape.
The area under the entire curve = 1 (i.e., 100% of people).
geom_density: What is a Density Curve?A histogram counts how many students fall in each trust “bucket.”
A density curve smooths those buckets into a single curve so we see the overall shape.
The area under the entire curve = 1 (i.e., 100% of people).
The probability that a randomly chosen person’s trust score is between certain values (e.g. 5 and 7)
geom_density: What is a Density Curve?A histogram counts how many students fall in each trust “bucket.”
A density curve smooths those buckets into a single curve so we see the overall shape.
The area under the entire curve = 1 (i.e., 100% of people).
The probability that a randomly chosen person’s trust score is above 5.
geom_density: What is a Density Curve?A histogram counts how many students fall in each trust “bucket.”
A density curve smooths those buckets into a single curve so we see the overall shape.
The area under the entire curve = 1 (i.e., 100% of people).
The probability that a randomly chosen person’s trust score is below 4.
geom_density: What is a Density Curve?Heights tell you nothing by themselves—areas tell you the share.
When I widen the range, the area grows—so the probability grows.
This curve is smoothed from the histogram; bandwidth controls how smooth.
geom_density: What is a Density Curve?Heights tell you nothing by themselves—areas tell you the share.
When I widen the range, the area grows—so the probability grows.
This curve is smoothed from the histogram; bandwidth controls how smooth.
Here too there is a connection between density, histograms, and boxplots.
geom_density: ExampleSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.
geom_density: ExampleSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are.
It is more informative to include both geom_density and geom_histogram.
# Match density to histogram counts
N <- nrow(eg5)
binwidth <- diff(range(eg5$trust)) / 10  # because bins = 10
# Draw histogram
ggplot(eg5, aes(x = trust)) +
  geom_histogram(bins = 10, color = "white") +
  geom_density(aes(y = after_stat(density) * N * binwidth), linewidth = 1) +
  labs(
    x = "trust",
    y = "count")geom_density: ExampleMost people in the survey gave middle-of-the-road trust scores—think around 5 to 7.
As you move toward the extremes (very low 1–3 or very high 8–10), the curve drops, showing fewer people chose those scores.
# Match density to histogram counts
N <- nrow(eg5)
binwidth <- diff(range(eg5$trust)) / 10  # because bins = 10
# Draw histogram
ggplot(eg5, aes(x = trust)) +
  geom_histogram(bins = 10, color = "white") +
  geom_density(aes(y = after_stat(density) * N * binwidth), linewidth = 1) +
  labs(
    x = "trust",
    y = "count")geom_violin: ExampleViolin plots are symmetric representations of geom_density
geom_violin: ExampleViolin plots are symmetric representations of geom_density
This was our density
geom_violin: ExampleViolin plots are symmetric representations of geom_density
This was our density shaded in white
geom_violin: ExampleViolin plots are symmetric representations of geom_density
This is our density flipped
geom_violin: ExampleViolin plots are symmetric representations of geom_density
This is what happens if we create a mirror density
geom_violin: Examplegeom_violin: ExampleSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
geom_violin: ExampleInterpretation: The data suggests that men’s responses are more spread out (from low to high trust), while women’s are more concentrated in the middle-to-high range.
geom_smoothThis is what we had previously:
# Example data
eg1 <- data.frame(
  democracy = c(-8, -7, -5, -3, 0, 2, 5, 8, 9),   # democracy score
  gdp = c(2, 9, 4, 7, 8, 20, 15, 25, 27),  # GDP per capita in $1000s
  country = c(
    "North Korea",   # very autocratic, very poor
    "Saudi Arabia",  # autocratic, but richer due to oil
    "Zimbabwe",      # authoritarian, low GDP
    "Russia",        # hybrid regime, middle income
    "Nigeria",       # similar position
    "India",         # low–mid democracy, growing GDP
    "Brazil",        # democracy, mid GDP
    "Poland",        # consolidated democracy, higher GDP
    "South Korea"   # rich democracy
  ))geom_smoothThis is what we had previously:
| democracy | gdp | country | 
|---|---|---|
| -8 | 2 | North Korea | 
| -7 | 9 | Saudi Arabia | 
| -5 | 4 | Zimbabwe | 
| -3 | 7 | Russia | 
| 0 | 8 | Nigeria | 
| 2 | 20 | India | 
| 5 | 15 | Brazil | 
| 8 | 25 | Poland | 
| 9 | 27 | South Korea | 
geom_smoothThis is what we had previously:
geom_smoothThis is what happens when we try to fit a line:
geom_smoothInterpretation: As one moves right (more democratic), the dots usually sit higher—meaning those countries tend to be richer. The black line going up summarizes that big-picture pattern, even though a few dots don’t fit.
geom_errorbarError bars are like “antennae” on top of a mean.
They don’t show the full spread like a boxplot, but they give us a quick picture of how precise the average is.
This is useful when we only want to compare averages.
geom_errorbarA natural question is how do error bars compare to boxplots.
Boxplot
Error Bar
geom_errorbarSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
# Set a random seed so results are reproducible.
# (Without this, R will generate different random numbers each time.)
set.seed(123)
# -------------------------------------------------------------
# Step 1: Create a toy dataset: "trust in government" by gender
# --------------------------------------------------------------
eg5 <- data.frame(
  # Repeat the labels "Men" and "Women" 20 times each
  gender = rep(c("Men", "Women"), each = 20),
  
  # Generate 20 trust values for Men ~ Normal(mean=5, sd=2)
  # Generate 20 trust values for Women ~ Normal(mean=8, sd=1)
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)
  )
)
#Print the first 3 entries
head(eg5, n=3)| gender | trust | 
|---|---|
| Men | 3.879049 | 
| Men | 4.539645 | 
| Men | 8.117417 | 
geom_errorbarSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
# ----------------------------------------------------------------------
# Step 2: Calculate mean trust and error bars (95% confidence intervals)
# ----------------------------------------------------------------------
df_err <- eg5 %>%
  group_by(gender) %>%       # Group data by gender
  summarise(
    mean = mean(trust),      # Average trust per gender
    se = sd(trust) / sqrt(n()), # Standard error = sd / sqrt(sample size)
  ) %>%
  mutate(
    # 95% confidence interval = mean ± 1.96 * standard error
    ymin = mean - 1.96 * se,   # Lower bound
    ymax = mean + 1.96 * se    # Upper bound
  )
head(df_err, n=3)| gender | mean | se | ymin | ymax | 
|---|---|---|---|---|
| Men | 5.283248 | 0.4349892 | 4.430669 | 6.135826 | 
| Women | 7.948743 | 0.1855799 | 7.585006 | 8.312480 | 
geom_errorbarSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
# --------------------------------------------------------
# Step 3: Plotting the means with error bars using ggplot2
# --------------------------------------------------------
ggplot(df_err, aes(x = gender, y = mean)) +
  geom_point(size = 4) +  # Plot mean values as large points
  geom_errorbar(
    aes(ymin = ymin, ymax = ymax), # Add vertical error bars
    width = 0.15                   # Small horizontal "cap" on error bars
  ) +
  labs(
    x = NULL,                       # Remove x-axis label
    y = "trust",      # Label y-axis
    title = "Error Bars"            # Add plot title
  ) +
  coord_cartesian(ylim = c(0, 10))  # Set y-axis limits from 0 to 10geom_errorbarSuppose we surveyed people about their trust in government on a 1–10 scale (1 = no trust, 10 = complete trust). We want to compare typical values and how spread out the answers are for men and women.
set.seed(123)
#Step1: Toy dataset: trust in government by gender
eg5 <- data.frame(
  gender = rep(c("Men", "Women"), each = 20),
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)))
#Step2: Calculating Error bars
df_err <- eg5 %>%
  group_by(gender) %>%
  summarise(
    mean = mean(trust),
    se = sd(trust) / sqrt(n()),
  ) %>%
  mutate(
    ymin = mean - 1.96 * se,   # 95% CI lower bound
    ymax = mean + 1.96 * se    # 95% CI upper bound
  )
#Step3: Plotting
ggplot(df_err, aes(x = gender, y = mean)) +
  geom_point(size = 4) +
  geom_errorbar(aes(ymin = ymin, ymax = ymax), width = 0.15) +
  labs(
    x = NULL,
    y = "trust",
    title = "Error Bars"
  ) +
  coord_cartesian(ylim = c(0, 10))   # same y-axisgeom_errorbarInterpretation: Women in this sample show higher average trust in government than men. The bars show our uncertainty, and since they don’t overlap much, it means the difference is likely real and not just due to chance.
set.seed(123)
#Step1: Toy dataset: trust in government by gender
eg5 <- data.frame(
  gender = rep(c("Men", "Women"), each = 20),
  trust = c(
    rnorm(20, mean = 5, sd = 2),
    rnorm(20, mean = 8, sd = 1)))
#Step2: Calculating Error bars
df_err <- eg5 %>%
  group_by(gender) %>%
  summarise(
    mean = mean(trust),
    se = sd(trust) / sqrt(n()),
  ) %>%
  mutate(
    ymin = mean - 1.96 * se,   # 95% CI lower bound
    ymax = mean + 1.96 * se    # 95% CI upper bound
  )
#Step3: Plotting
ggplot(df_err, aes(x = gender, y = mean)) +
  geom_point(size = 4) +
  geom_errorbar(aes(ymin = ymin, ymax = ymax), width = 0.15) +
  labs(
    x = NULL,
    y = "trust",
    title = "Error Bars"
  ) +
  coord_cartesian(ylim = c(0, 10))   # same y-axisgeom_sfgeom_sf draws maps from sf objects (“simple features” = data + geometry).
It works like other geoms: points and lines
geom_sfgeom_sf draws maps from sf objects (“simple features” = data + geometry).
With geom_sf, you can map points (cities, places, settlements)
geom_sfgeom_sf draws maps from sf objects (“simple features” = data + geometry).
With geom_sf, you can map lines (rivers, roads, straight lines)
geom_sfgeom_sf draws maps from sf objects (“simple features” = data + geometry).
With geom_sf, you can map polygons (countries, parks, seas, etc.)
geom_sfgeom_sf draws maps from sf objects (“simple features” = data + geometry).
It works like other geoms: points and lines
Typical workflow:
geom_sfgeom_sf Example# Install once (uncomment if needed):
# install.packages(c("sf", "ggplot2", "rnaturalearth", "rnaturalearthdata"))
library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
# Step1: Get a country polygons dataset as an sf object
world <- rnaturalearth::ne_countries(scale = "medium",
                                     returnclass = "sf")
head(world, n=3)| featurecla | scalerank | labelrank | sovereignt | sov_a3 | adm0_dif | level | type | tlc | admin | adm0_a3 | geou_dif | geounit | gu_a3 | su_dif | subunit | su_a3 | brk_diff | name | name_long | brk_a3 | brk_name | brk_group | abbrev | postal | formal_en | formal_fr | name_ciawf | note_adm0 | note_brk | name_sort | name_alt | mapcolor7 | mapcolor8 | mapcolor9 | mapcolor13 | pop_est | pop_rank | pop_year | gdp_md | gdp_year | economy | income_grp | fips_10 | iso_a2 | iso_a2_eh | iso_a3 | iso_a3_eh | iso_n3 | iso_n3_eh | un_a3 | wb_a2 | wb_a3 | woe_id | woe_id_eh | woe_note | adm0_iso | adm0_diff | adm0_tlc | adm0_a3_us | adm0_a3_fr | adm0_a3_ru | adm0_a3_es | adm0_a3_cn | adm0_a3_tw | adm0_a3_in | adm0_a3_np | adm0_a3_pk | adm0_a3_de | adm0_a3_gb | adm0_a3_br | adm0_a3_il | adm0_a3_ps | adm0_a3_sa | adm0_a3_eg | adm0_a3_ma | adm0_a3_pt | adm0_a3_ar | adm0_a3_jp | adm0_a3_ko | adm0_a3_vn | adm0_a3_tr | adm0_a3_id | adm0_a3_pl | adm0_a3_gr | adm0_a3_it | adm0_a3_nl | adm0_a3_se | adm0_a3_bd | adm0_a3_ua | adm0_a3_un | adm0_a3_wb | continent | region_un | subregion | region_wb | name_len | long_len | abbrev_len | tiny | homepart | min_zoom | min_label | max_label | label_x | label_y | ne_id | wikidataid | name_ar | name_bn | name_de | name_en | name_es | name_fa | name_fr | name_el | name_he | name_hi | name_hu | name_id | name_it | name_ja | name_ko | name_nl | name_pl | name_pt | name_ru | name_sv | name_tr | name_uk | name_ur | name_vi | name_zh | name_zht | fclass_iso | tlc_diff | fclass_tlc | fclass_us | fclass_fr | fclass_ru | fclass_es | fclass_cn | fclass_tw | fclass_in | fclass_np | fclass_pk | fclass_de | fclass_gb | fclass_br | fclass_il | fclass_ps | fclass_sa | fclass_eg | fclass_ma | fclass_pt | fclass_ar | fclass_jp | fclass_ko | fclass_vn | fclass_tr | fclass_id | fclass_pl | fclass_gr | fclass_it | fclass_nl | fclass_se | fclass_bd | fclass_ua | geometry | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Admin-0 country | 1 | 3 | Zimbabwe | ZWE | 0 | 2 | Sovereign country | 1 | Zimbabwe | ZWE | 0 | Zimbabwe | ZWE | 0 | Zimbabwe | ZWE | 0 | Zimbabwe | Zimbabwe | ZWE | Zimbabwe | NA | Zimb. | ZW | Republic of Zimbabwe | NA | Zimbabwe | NA | NA | Zimbabwe | NA | 1 | 5 | 3 | 9 | 14645468 | 14 | 2019 | 21440 | 2019 | 5. Emerging region: G20 | 5. Low income | ZI | ZW | ZW | ZWE | ZWE | 716 | 716 | 716 | ZW | ZWE | 23425004 | 23425004 | Exact WOE match as country | ZWE | NA | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | ZWE | -99 | -99 | Africa | Africa | Eastern Africa | Sub-Saharan Africa | 8 | 8 | 5 | -99 | 1 | 0 | 2.5 | 8 | 29.92544 | -18.91164 | 1159321441 | Q954 | زيمبابوي | জিম্বাবুয়ে | Simbabwe | Zimbabwe | Zimbabue | زیمبابوه | Zimbabwe | Ζιμπάμπουε | זימבבואה | ज़िम्बाब्वे | Zimbabwe | Zimbabwe | Zimbabwe | ジンバブエ | 짐바브웨 | Zimbabwe | Zimbabwe | Zimbábue | Зимбабве | Zimbabwe | Zimbabve | Зімбабве | زمبابوے | Zimbabwe | 津巴布韦 | 辛巴威 | Admin-0 country | NA | Admin-0 country | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | MULTIPOLYGON (((31.28789 -2... | 
| Admin-0 country | 1 | 3 | Zambia | ZMB | 0 | 2 | Sovereign country | 1 | Zambia | ZMB | 0 | Zambia | ZMB | 0 | Zambia | ZMB | 0 | Zambia | Zambia | ZMB | Zambia | NA | Zambia | ZM | Republic of Zambia | NA | Zambia | NA | NA | Zambia | NA | 5 | 8 | 5 | 13 | 17861030 | 14 | 2019 | 23309 | 2019 | 7. Least developed region | 4. Lower middle income | ZA | ZM | ZM | ZMB | ZMB | 894 | 894 | 894 | ZM | ZMB | 23425003 | 23425003 | Exact WOE match as country | ZMB | NA | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | ZMB | -99 | -99 | Africa | Africa | Eastern Africa | Sub-Saharan Africa | 6 | 6 | 6 | -99 | 1 | 0 | 3.0 | 8 | 26.39530 | -14.66080 | 1159321439 | Q953 | زامبيا | জাম্বিয়া | Sambia | Zambia | Zambia | زامبیا | Zambie | Ζάμπια | זמביה | ज़ाम्बिया | Zambia | Zambia | Zambia | ザンビア | 잠비아 | Zambia | Zambia | Zâmbia | Замбия | Zambia | Zambiya | Замбія | زیمبیا | Zambia | 赞比亚 | 尚比亞 | Admin-0 country | NA | Admin-0 country | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | MULTIPOLYGON (((30.39609 -1... | 
| Admin-0 country | 1 | 3 | Yemen | YEM | 0 | 2 | Sovereign country | 1 | Yemen | YEM | 0 | Yemen | YEM | 0 | Yemen | YEM | 0 | Yemen | Yemen | YEM | Yemen | NA | Yem. | YE | Republic of Yemen | NA | Yemen | NA | NA | Yemen, Rep. | NA | 5 | 3 | 3 | 11 | 29161922 | 15 | 2019 | 22581 | 2019 | 7. Least developed region | 4. Lower middle income | YM | YE | YE | YEM | YEM | 887 | 887 | 887 | RY | YEM | 23425002 | 23425002 | Exact WOE match as country | YEM | NA | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | YEM | -99 | -99 | Asia | Asia | Western Asia | Middle East & North Africa | 5 | 5 | 4 | -99 | 1 | 0 | 3.0 | 8 | 45.87438 | 15.32823 | 1159321425 | Q805 | اليمن | ইয়েমেন | Jemen | Yemen | Yemen | یمن | Yémen | Υεμένη | תימן | यमन | Jemen | Yaman | Yemen | イエメン | 예멘 | Jemen | Jemen | Iémen | Йемен | Jemen | Yemen | Ємен | یمن | Yemen | 也门 | 葉門 | Admin-0 country | NA | Admin-0 country | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | MULTIPOLYGON (((53.08564 16... | 
geom_sf Example# Install once (uncomment if needed):
# install.packages(c("sf", "ggplot2", "rnaturalearth", "rnaturalearthdata"))
library(ggplot2)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
# Step1: Get a country polygons dataset as an sf object
world <- rnaturalearth::ne_countries(scale = "medium",
                                     returnclass = "sf")
head(world, n=3)So, the world dataframe contains 169 variables (e.g. featurecla, scalerank, labelrank, etc.) and 242 observations
geom_sf Examplegeom_sf ExampleUse coord_sf(xlim=..., ylim=...) to zoom
You can layer multiple sf objects:
geom_sf Examplegeom_sf Examplegeom_sf Exampleeurope_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))
# Create a few points
cities <- data.frame(
  name = c("Rome", "Berlin", "Paris"),
  lon = c(12.5, 13.4, 2.35),
  lat = c(41.9, 52.5, 48.85)
)
# Convert to sf POINTs
cities_sf <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4326)
# Mapping Them
ggplot() +
  geom_sf(data = world) +
  geom_sf(data = cities_sf) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")geom_sf Exampleeurope_bounds <- list(x = c(-10, 40),
                      y = c(35, 70))
# Create a few points
cities <- data.frame(
  name = c("Rome", "Berlin", "Paris"),
  lon = c(12.5, 13.4, 2.35),
  lat = c(41.9, 52.5, 48.85)
)
# Convert to sf POINTs
cities_sf <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4326)
# Create a line connecting Rome -> Berlin -> Paris
line_sf <- st_sfc(st_linestring(as.matrix(cities[, c("lon", "lat")])), crs = 4326)
# Mapping Them
ggplot() +
  geom_sf(data = world) +
  geom_sf(data = cities_sf) +
  geom_sf(data = line_sf) +
  coord_sf(xlim = europe_bounds$x, 
           ylim = europe_bounds$y) +
  labs(title = "European Countries")Use the dataset below:
geom_point().Suppose you have survey data on respondents’ education:
geom_bar()geom_col()We surveyed trust in government on a 1–10 scale:
Use the dataset below:
geom_point().Use the dataset below:
geom_line().The line plot is usually more informative because it shows the trend in turnout over time.
Suppose you have survey data on respondents’ education:
geom_bar()Suppose you have survey data on respondents’ education:
dplyr::count() and plot them with geom_col():| education | n | 
|---|---|
| College | 3 | 
| High School | 2 | 
| Primary | 2 | 
Suppose you have survey data on respondents’ education:
dplyr::count() and plot them with geom_col():Suppose you have survey data on respondents’ education:
The key distinction:
geom_bar() = ggplot does the counting for you.geom_col() = you must provide the counts yourself.We surveyed trust in government on a 1–10 scale:
We surveyed trust in government on a 1–10 scale:
Popescu (JCU): Data Visualization 1