1 Intro

Open a new Quarto document. Select all text by pressing Ctrl + A (Windows) or Cmd + A (Mac), then press Delete. Next, copy and paste the following preamble.

---
title: "Lab 7"
author: "Your Name"
date: "September 13, 2024"
format:
  html:
    toc: true
    number-sections: true
    colorlinks: true
    smooth-scroll: true
    embed-resources: true
---

Render the document and save it under “week7” in your “stats” folder.

2 Background

#Removing previous datasets in memory
rm(list = ls())
#Loading the relevant libraries
library(ggplot2)
library(gridExtra)
library(dplyr)
library(purrr)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(ggrepel)

In this lab session, we delve into the practical application of T-tests, correlations, and mapping techniques.

From our lecture, we gained insights into Z-tests, which assess differences between two groups when the standard deviation or variance is known. Typically, a sample size larger than 30 is recommended to perform a z-test. However, in many instances, the population standard deviation is unknown, leading us to rely more frequently on t-tests. To elucidate the mechanics of t-tests, let’s revisit our sample of Latin American countries and compare them with global data.

3 Loading the Data

Let us go back to the old Life Expectancy dataset. If you don’t have it anymore, you can download Life Expectancy and urbanization from the following links:

Let us re-examine the distribution of our life expectancy dataset by making a histogram. We need to first load the data:

#Setting path
setwd("/Users/bgpopescu/Dropbox/john_cabot/teaching/stats/week7/lab/")
#Step1: Loading the data
life_expectancy_df <- read.csv(file = './data/life-expectancy.csv')
urbanization_df <- read.csv(file = './data/share-of-population-urban.csv')

4 Cleaning the Data

In the next few lines we focus on the period starting with 1901, we average life expectancy over country (the original dataset is a panel - with countries and years), we eliminate entities which do not have country codes or have strange ones. By taking the average by country, we are getting rid of the time component.

#Step1: Selecting after 1900
life_expectancy_df<-subset(life_expectancy_df, Year>1900)
#Step2: Calculating the mean
life_expectancy_df2<-life_expectancy_df%>%
  dplyr::group_by(Entity, Code)%>%
  dplyr::summarize(life_exp_mean=mean(Life.expectancy.at.birth..historical.))
#Step3: Cleaning the Data
weird_labels <- c("OWID_KOS", "OWID_WRL", "")
clean_life_expectancy_df<-subset(life_expectancy_df2, !(Code %in% weird_labels))

Let us look at our data. We have learned so far to use head.

head(clean_life_expectancy_df, n=5)

We can also use potentially a more powerful function to look at our data. This is called glimpse.

glimpse(clean_life_expectancy_df)

Rows: 235
Columns: 3
Groups: Entity [235]
$ Entity        <chr> "Afghanistan", "Albania", "Algeria", "American Samoa", "…
$ Code          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "AIA", "ATG", …
$ life_exp_mean <dbl> 45.38333, 68.28611, 57.53013, 68.63750, 77.04861, 45.084…

As you can see, this tells us exactly how many observations (235), how many columns/variables (3) we have, the first values that these variables take, and the type of variables that we are dealing with: in this case, character and integers. We should consider using glimpse from now on to understand our data better.

5 Mapping the Data

We will now try to map the data. The package rnaturalearth provides the geographic data for all countries in the world. Use ne_countries to pull country data and choose the scale for how detailed your want your maps to be. Larger scales take more space and time to process, while smaller scales take less space and less time to process. (rnaturalearthhires is necessary for scale = “large”).

world <- ne_countries(scale = "medium", returnclass = "sf")

We are now working with a different object. The data that we just cleaned - clean_life_expectancy_df, falls under three different categories of objects: data frame, grouped data frame, a tibble, and a dataframe.

class(clean_life_expectancy_df)

[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

Tibble is the central data structure for the set of packages known as the tidyverse, which include dplyr, ggplot2, tidyr, and readr. The general ethos is that tibbles are lazy and surly: they do less and complain more than base data.frames. This forces problems to be tackled earlier and more explicitly, typically leading to code that is more expressive and robust.

If we look at the class for our new object world, it is a different class.

class(world)

[1] "sf"         "data.frame"

glimpse(world)

Rows: 242
Columns: 169
$ featurecla <chr> "Admin-0 country", "Admin-0 country", "Admin-0 country", "A…
$ scalerank  <int> 1, 1, 1, 3, 5, 6, 1, 1, 1, 3, 5, 3, 3, 3, 3, 1, 5, 3, 3, 3,…
$ labelrank  <int> 3, 3, 3, 2, 3, 6, 4, 3, 4, 6, 6, 6, 6, 6, 4, 5, 2, 4, 5, 6,…
$ sovereignt <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ sov_a3     <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_dif   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ level      <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ type       <chr> "Sovereign country", "Sovereign country", "Sovereign countr…
$ tlc        <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
$ admin      <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ adm0_a3    <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ geou_dif   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ geounit    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ gu_a3      <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ su_dif     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ subunit    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ su_a3      <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ brk_diff   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ name       <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ name_long  <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ brk_a3     <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ brk_name   <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ brk_group  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ abbrev     <chr> "Zimb.", "Zambia", "Yem.", "Viet.", "Ven.", "Vat.", "Van.",…
$ postal     <chr> "ZW", "ZM", "YE", "VN", "VE", "V", "VU", "UZ", "UY", "FSM",…
$ formal_en  <chr> "Republic of Zimbabwe", "Republic of Zambia", "Republic of …
$ formal_fr  <chr> NA, NA, NA, NA, "República Bolivariana de Venezuela", NA, N…
$ name_ciawf <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Hol…
$ note_adm0  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "U.S.A.", "U.S.…
$ note_brk   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ name_sort  <chr> "Zimbabwe", "Zambia", "Yemen, Rep.", "Vietnam", "Venezuela,…
$ name_alt   <chr> NA, NA, NA, NA, NA, "Holy See", NA, NA, NA, NA, NA, NA, NA,…
$ mapcolor7  <int> 1, 5, 5, 5, 1, 1, 6, 2, 1, 5, 2, 4, 4, 4, 4, 4, 4, 6, 6, 6,…
$ mapcolor8  <int> 5, 8, 3, 6, 3, 3, 3, 3, 2, 2, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
$ mapcolor9  <int> 3, 5, 3, 5, 1, 4, 7, 5, 2, 4, 5, 1, 1, 1, 1, 1, 1, 6, 6, 6,…
$ mapcolor13 <int> 9, 13, 11, 4, 4, 2, 3, 4, 10, 13, 3, 1, 1, 1, 1, 1, 1, 3, 3…
$ pop_est    <dbl> 14645468, 17861030, 29161922, 96462106, 28515829, 825, 2998…
$ pop_rank   <int> 14, 14, 15, 16, 15, 2, 10, 15, 12, 9, 8, 8, 9, 9, 8, 12, 17…
$ pop_year   <int> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,…
$ gdp_md     <int> 21440, 23309, 22581, 261921, 482359, -99, 934, 57921, 56045…
$ gdp_year   <int> 2019, 2019, 2019, 2019, 2014, 2019, 2019, 2019, 2019, 2018,…
$ economy    <chr> "5. Emerging region: G20", "7. Least developed region", "7.…
$ income_grp <chr> "5. Low income", "4. Lower middle income", "4. Lower middle…
$ fips_10    <chr> "ZI", "ZA", "YM", "VM", "VE", "VT", "NH", "UZ", "UY", "FM",…
$ iso_a2     <chr> "ZW", "ZM", "YE", "VN", "VE", "VA", "VU", "UZ", "UY", "FM",…
$ iso_a2_eh  <chr> "ZW", "ZM", "YE", "VN", "VE", "VA", "VU", "UZ", "UY", "FM",…
$ iso_a3     <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ iso_a3_eh  <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ iso_n3     <chr> "716", "894", "887", "704", "862", "336", "548", "860", "85…
$ iso_n3_eh  <chr> "716", "894", "887", "704", "862", "336", "548", "860", "85…
$ un_a3      <chr> "716", "894", "887", "704", "862", "336", "548", "860", "85…
$ wb_a2      <chr> "ZW", "ZM", "RY", "VN", "VE", "-99", "VU", "UZ", "UY", "FM"…
$ wb_a3      <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "-99", "VUT", "UZB", "UR…
$ woe_id     <int> 23425004, 23425003, 23425002, 23424984, 23424982, 23424986,…
$ woe_id_eh  <int> 23425004, 23425003, 23425002, 23424984, 23424982, 23424986,…
$ woe_note   <chr> "Exact WOE match as country", "Exact WOE match as country",…
$ adm0_iso   <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_diff  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ adm0_tlc   <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_us <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_fr <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ru <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_es <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_cn <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_tw <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_in <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_np <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_pk <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_de <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_gb <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_br <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_il <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ps <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_sa <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_eg <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ma <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_pt <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ar <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_jp <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ko <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_vn <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_tr <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_id <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_pl <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_gr <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_it <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_nl <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_se <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_bd <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ua <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_un <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99,…
$ adm0_a3_wb <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99,…
$ continent  <chr> "Africa", "Africa", "Asia", "Asia", "South America", "Europ…
$ region_un  <chr> "Africa", "Africa", "Asia", "Asia", "Americas", "Europe", "…
$ subregion  <chr> "Eastern Africa", "Eastern Africa", "Western Asia", "South-…
$ region_wb  <chr> "Sub-Saharan Africa", "Sub-Saharan Africa", "Middle East & …
$ name_len   <int> 8, 6, 5, 7, 9, 7, 7, 10, 7, 10, 12, 14, 15, 4, 14, 11, 24, …
$ long_len   <int> 8, 6, 5, 7, 9, 7, 7, 10, 7, 30, 16, 24, 28, 4, 14, 11, 13, …
$ abbrev_len <int> 5, 6, 4, 5, 4, 4, 4, 4, 4, 6, 6, 6, 11, 4, 9, 4, 6, 10, 6, …
$ tiny       <int> -99, -99, -99, 2, -99, 4, 2, 5, -99, -99, 2, 3, 3, 2, 3, -9…
$ homepart   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -99, -99, -99, -99, -99, 1…
$ min_zoom   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ min_label  <dbl> 2.5, 3.0, 3.0, 2.0, 2.5, 5.0, 4.0, 3.0, 3.0, 5.0, 5.0, 5.0,…
$ max_label  <dbl> 8.0, 8.0, 8.0, 7.0, 7.5, 10.0, 9.0, 8.0, 8.0, 10.0, 10.0, 1…
$ label_x    <dbl> 29.92544, 26.39530, 45.87438, 105.38729, -64.59938, 12.4534…
$ label_y    <dbl> -18.911640, -14.660804, 15.328226, 21.715416, 7.182476, 41.…
$ ne_id      <dbl> 1159321441, 1159321439, 1159321425, 1159321417, 1159321411,…
$ wikidataid <chr> "Q954", "Q953", "Q805", "Q881", "Q717", "Q237", "Q686", "Q2…
$ name_ar    <chr> "زيمبابوي", "زامبيا", "اليمن", "فيتنام", "فنزويلا", "الفاتي…
$ name_bn    <chr> "জিম্বাবুয়ে", "জাম্বিয়া", "ইয়েমেন", "ভিয়েতনাম", "ভেনেজুয়েলা", "ভ্যাটিকান সিটি", "ভানুয়াতু",…
$ name_de    <chr> "Simbabwe", "Sambia", "Jemen", "Vietnam", "Venezuela", "Vat…
$ name_en    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ name_es    <chr> "Zimbabue", "Zambia", "Yemen", "Vietnam", "Venezuela", "Ciu…
$ name_fa    <chr> "زیمبابوه", "زامبیا", "یمن", "ویتنام", "ونزوئلا", "واتیکان"…
$ name_fr    <chr> "Zimbabwe", "Zambie", "Yémen", "Viêt Nam", "Venezuela", "Ci…
$ name_el    <chr> "Ζιμπάμπουε", "Ζάμπια", "Υεμένη", "Βιετνάμ", "Βενεζουέλα", …
$ name_he    <chr> "זימבבואה", "זמביה", "תימן", "וייטנאם", "ונצואלה", "קריית ה…
$ name_hi    <chr> "ज़िम्बाब्वे", "ज़ाम्बिया", "यमन", "वियतनाम", "वेनेज़ुएला", "वैटिकन नगर", "वानूआटू", …
$ name_hu    <chr> "Zimbabwe", "Zambia", "Jemen", "Vietnám", "Venezuela", "Vat…
$ name_id    <chr> "Zimbabwe", "Zambia", "Yaman", "Vietnam", "Venezuela", "Vat…
$ name_it    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Cit…
$ name_ja    <chr> "ジンバブエ", "ザンビア", "イエメン", "ベトナム", "ベネズエラ", "バチカン", "バヌアツ", "…
$ name_ko    <chr> "짐바브웨", "잠비아", "예멘", "베트남", "베네수엘라", "바티칸 시국", "바누아투", "우즈베…
$ name_nl    <chr> "Zimbabwe", "Zambia", "Jemen", "Vietnam", "Venezuela", "Vat…
$ name_pl    <chr> "Zimbabwe", "Zambia", "Jemen", "Wietnam", "Wenezuela", "Wat…
$ name_pt    <chr> "Zimbábue", "Zâmbia", "Iémen", "Vietname", "Venezuela", "Va…
$ name_ru    <chr> "Зимбабве", "Замбия", "Йемен", "Вьетнам", "Венесуэла", "Ват…
$ name_sv    <chr> "Zimbabwe", "Zambia", "Jemen", "Vietnam", "Venezuela", "Vat…
$ name_tr    <chr> "Zimbabve", "Zambiya", "Yemen", "Vietnam", "Venezuela", "Va…
$ name_uk    <chr> "Зімбабве", "Замбія", "Ємен", "В'єтнам", "Венесуела", "Вати…
$ name_ur    <chr> "زمبابوے", "زیمبیا", "یمن", "ویتنام", "وینیزویلا", "ویٹیکن …
$ name_vi    <chr> "Zimbabwe", "Zambia", "Yemen", "Việt Nam", "Venezuela", "Th…
$ name_zh    <chr> "津巴布韦", "赞比亚", "也门", "越南", "委内瑞拉", "梵蒂冈", "瓦努阿图", "乌兹别克斯坦",…
$ name_zht   <chr> "辛巴威", "尚比亞", "葉門", "越南", "委內瑞拉", "梵蒂岡", "萬那杜", "烏茲別克", "烏拉…
$ fclass_iso <chr> "Admin-0 country", "Admin-0 country", "Admin-0 country", "A…
$ tlc_diff   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_tlc <chr> "Admin-0 country", "Admin-0 country", "Admin-0 country", "A…
$ fclass_us  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_fr  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ru  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_es  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_cn  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_tw  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_in  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_np  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_pk  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_de  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_gb  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_br  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_il  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ps  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_sa  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_eg  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ma  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_pt  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ar  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_jp  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ko  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_vn  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_tr  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_id  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_pl  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_gr  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_it  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_nl  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_se  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_bd  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ua  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGON (…

At its most basic, an sf object is a collection of simple features that includes attributes and geometries in the form of a data frame. In other words, it is a data frame (or tibble) with rows of features, columns of attributes, and a special geometry column that contains the spatial aspects of the features.

We can use this format to make maps.

ggplot() +
    geom_sf(data = world)

But let us also look more closely at the attribute table of this spatial dataframe.

glimpse(world)

Rows: 242
Columns: 169
$ featurecla <chr> "Admin-0 country", "Admin-0 country", "Admin-0 country", "A…
$ scalerank  <int> 1, 1, 1, 3, 5, 6, 1, 1, 1, 3, 5, 3, 3, 3, 3, 1, 5, 3, 3, 3,…
$ labelrank  <int> 3, 3, 3, 2, 3, 6, 4, 3, 4, 6, 6, 6, 6, 6, 4, 5, 2, 4, 5, 6,…
$ sovereignt <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ sov_a3     <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_dif   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ level      <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ type       <chr> "Sovereign country", "Sovereign country", "Sovereign countr…
$ tlc        <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
$ admin      <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ adm0_a3    <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ geou_dif   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ geounit    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ gu_a3      <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ su_dif     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ subunit    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ su_a3      <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ brk_diff   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ name       <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ name_long  <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ brk_a3     <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ brk_name   <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ brk_group  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ abbrev     <chr> "Zimb.", "Zambia", "Yem.", "Viet.", "Ven.", "Vat.", "Van.",…
$ postal     <chr> "ZW", "ZM", "YE", "VN", "VE", "V", "VU", "UZ", "UY", "FSM",…
$ formal_en  <chr> "Republic of Zimbabwe", "Republic of Zambia", "Republic of …
$ formal_fr  <chr> NA, NA, NA, NA, "República Bolivariana de Venezuela", NA, N…
$ name_ciawf <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Hol…
$ note_adm0  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "U.S.A.", "U.S.…
$ note_brk   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ name_sort  <chr> "Zimbabwe", "Zambia", "Yemen, Rep.", "Vietnam", "Venezuela,…
$ name_alt   <chr> NA, NA, NA, NA, NA, "Holy See", NA, NA, NA, NA, NA, NA, NA,…
$ mapcolor7  <int> 1, 5, 5, 5, 1, 1, 6, 2, 1, 5, 2, 4, 4, 4, 4, 4, 4, 6, 6, 6,…
$ mapcolor8  <int> 5, 8, 3, 6, 3, 3, 3, 3, 2, 2, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
$ mapcolor9  <int> 3, 5, 3, 5, 1, 4, 7, 5, 2, 4, 5, 1, 1, 1, 1, 1, 1, 6, 6, 6,…
$ mapcolor13 <int> 9, 13, 11, 4, 4, 2, 3, 4, 10, 13, 3, 1, 1, 1, 1, 1, 1, 3, 3…
$ pop_est    <dbl> 14645468, 17861030, 29161922, 96462106, 28515829, 825, 2998…
$ pop_rank   <int> 14, 14, 15, 16, 15, 2, 10, 15, 12, 9, 8, 8, 9, 9, 8, 12, 17…
$ pop_year   <int> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,…
$ gdp_md     <int> 21440, 23309, 22581, 261921, 482359, -99, 934, 57921, 56045…
$ gdp_year   <int> 2019, 2019, 2019, 2019, 2014, 2019, 2019, 2019, 2019, 2018,…
$ economy    <chr> "5. Emerging region: G20", "7. Least developed region", "7.…
$ income_grp <chr> "5. Low income", "4. Lower middle income", "4. Lower middle…
$ fips_10    <chr> "ZI", "ZA", "YM", "VM", "VE", "VT", "NH", "UZ", "UY", "FM",…
$ iso_a2     <chr> "ZW", "ZM", "YE", "VN", "VE", "VA", "VU", "UZ", "UY", "FM",…
$ iso_a2_eh  <chr> "ZW", "ZM", "YE", "VN", "VE", "VA", "VU", "UZ", "UY", "FM",…
$ iso_a3     <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ iso_a3_eh  <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ iso_n3     <chr> "716", "894", "887", "704", "862", "336", "548", "860", "85…
$ iso_n3_eh  <chr> "716", "894", "887", "704", "862", "336", "548", "860", "85…
$ un_a3      <chr> "716", "894", "887", "704", "862", "336", "548", "860", "85…
$ wb_a2      <chr> "ZW", "ZM", "RY", "VN", "VE", "-99", "VU", "UZ", "UY", "FM"…
$ wb_a3      <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "-99", "VUT", "UZB", "UR…
$ woe_id     <int> 23425004, 23425003, 23425002, 23424984, 23424982, 23424986,…
$ woe_id_eh  <int> 23425004, 23425003, 23425002, 23424984, 23424982, 23424986,…
$ woe_note   <chr> "Exact WOE match as country", "Exact WOE match as country",…
$ adm0_iso   <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_diff  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ adm0_tlc   <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_us <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_fr <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ru <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_es <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_cn <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_tw <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_in <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_np <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_pk <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_de <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_gb <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_br <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_il <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ps <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_sa <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_eg <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ma <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_pt <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ar <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_jp <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ko <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_vn <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_tr <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_id <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_pl <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_gr <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_it <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_nl <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_se <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_bd <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_ua <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ adm0_a3_un <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99,…
$ adm0_a3_wb <int> -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99, -99,…
$ continent  <chr> "Africa", "Africa", "Asia", "Asia", "South America", "Europ…
$ region_un  <chr> "Africa", "Africa", "Asia", "Asia", "Americas", "Europe", "…
$ subregion  <chr> "Eastern Africa", "Eastern Africa", "Western Asia", "South-…
$ region_wb  <chr> "Sub-Saharan Africa", "Sub-Saharan Africa", "Middle East & …
$ name_len   <int> 8, 6, 5, 7, 9, 7, 7, 10, 7, 10, 12, 14, 15, 4, 14, 11, 24, …
$ long_len   <int> 8, 6, 5, 7, 9, 7, 7, 10, 7, 30, 16, 24, 28, 4, 14, 11, 13, …
$ abbrev_len <int> 5, 6, 4, 5, 4, 4, 4, 4, 4, 6, 6, 6, 11, 4, 9, 4, 6, 10, 6, …
$ tiny       <int> -99, -99, -99, 2, -99, 4, 2, 5, -99, -99, 2, 3, 3, 2, 3, -9…
$ homepart   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -99, -99, -99, -99, -99, 1…
$ min_zoom   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ min_label  <dbl> 2.5, 3.0, 3.0, 2.0, 2.5, 5.0, 4.0, 3.0, 3.0, 5.0, 5.0, 5.0,…
$ max_label  <dbl> 8.0, 8.0, 8.0, 7.0, 7.5, 10.0, 9.0, 8.0, 8.0, 10.0, 10.0, 1…
$ label_x    <dbl> 29.92544, 26.39530, 45.87438, 105.38729, -64.59938, 12.4534…
$ label_y    <dbl> -18.911640, -14.660804, 15.328226, 21.715416, 7.182476, 41.…
$ ne_id      <dbl> 1159321441, 1159321439, 1159321425, 1159321417, 1159321411,…
$ wikidataid <chr> "Q954", "Q953", "Q805", "Q881", "Q717", "Q237", "Q686", "Q2…
$ name_ar    <chr> "زيمبابوي", "زامبيا", "اليمن", "فيتنام", "فنزويلا", "الفاتي…
$ name_bn    <chr> "জিম্বাবুয়ে", "জাম্বিয়া", "ইয়েমেন", "ভিয়েতনাম", "ভেনেজুয়েলা", "ভ্যাটিকান সিটি", "ভানুয়াতু",…
$ name_de    <chr> "Simbabwe", "Sambia", "Jemen", "Vietnam", "Venezuela", "Vat…
$ name_en    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ name_es    <chr> "Zimbabue", "Zambia", "Yemen", "Vietnam", "Venezuela", "Ciu…
$ name_fa    <chr> "زیمبابوه", "زامبیا", "یمن", "ویتنام", "ونزوئلا", "واتیکان"…
$ name_fr    <chr> "Zimbabwe", "Zambie", "Yémen", "Viêt Nam", "Venezuela", "Ci…
$ name_el    <chr> "Ζιμπάμπουε", "Ζάμπια", "Υεμένη", "Βιετνάμ", "Βενεζουέλα", …
$ name_he    <chr> "זימבבואה", "זמביה", "תימן", "וייטנאם", "ונצואלה", "קריית ה…
$ name_hi    <chr> "ज़िम्बाब्वे", "ज़ाम्बिया", "यमन", "वियतनाम", "वेनेज़ुएला", "वैटिकन नगर", "वानूआटू", …
$ name_hu    <chr> "Zimbabwe", "Zambia", "Jemen", "Vietnám", "Venezuela", "Vat…
$ name_id    <chr> "Zimbabwe", "Zambia", "Yaman", "Vietnam", "Venezuela", "Vat…
$ name_it    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Cit…
$ name_ja    <chr> "ジンバブエ", "ザンビア", "イエメン", "ベトナム", "ベネズエラ", "バチカン", "バヌアツ", "…
$ name_ko    <chr> "짐바브웨", "잠비아", "예멘", "베트남", "베네수엘라", "바티칸 시국", "바누아투", "우즈베…
$ name_nl    <chr> "Zimbabwe", "Zambia", "Jemen", "Vietnam", "Venezuela", "Vat…
$ name_pl    <chr> "Zimbabwe", "Zambia", "Jemen", "Wietnam", "Wenezuela", "Wat…
$ name_pt    <chr> "Zimbábue", "Zâmbia", "Iémen", "Vietname", "Venezuela", "Va…
$ name_ru    <chr> "Зимбабве", "Замбия", "Йемен", "Вьетнам", "Венесуэла", "Ват…
$ name_sv    <chr> "Zimbabwe", "Zambia", "Jemen", "Vietnam", "Venezuela", "Vat…
$ name_tr    <chr> "Zimbabve", "Zambiya", "Yemen", "Vietnam", "Venezuela", "Va…
$ name_uk    <chr> "Зімбабве", "Замбія", "Ємен", "В'єтнам", "Венесуела", "Вати…
$ name_ur    <chr> "زمبابوے", "زیمبیا", "یمن", "ویتنام", "وینیزویلا", "ویٹیکن …
$ name_vi    <chr> "Zimbabwe", "Zambia", "Yemen", "Việt Nam", "Venezuela", "Th…
$ name_zh    <chr> "津巴布韦", "赞比亚", "也门", "越南", "委内瑞拉", "梵蒂冈", "瓦努阿图", "乌兹别克斯坦",…
$ name_zht   <chr> "辛巴威", "尚比亞", "葉門", "越南", "委內瑞拉", "梵蒂岡", "萬那杜", "烏茲別克", "烏拉…
$ fclass_iso <chr> "Admin-0 country", "Admin-0 country", "Admin-0 country", "A…
$ tlc_diff   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_tlc <chr> "Admin-0 country", "Admin-0 country", "Admin-0 country", "A…
$ fclass_us  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_fr  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ru  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_es  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_cn  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_tw  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_in  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_np  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_pk  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_de  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_gb  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_br  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_il  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ps  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_sa  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_eg  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ma  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_pt  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ar  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_jp  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ko  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_vn  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_tr  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_id  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_pl  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_gr  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_it  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_nl  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_se  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_bd  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fclass_ua  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGON (…

Our data has a lot of variables, which we don’t need. It seems that the variable adm0_a3 might be a good variable to keep: we will use this variable to merge our clean data frame. It might also be helpful to keep admin, continent, sovereignt. These will help us identify some smaller geographic units (islands) which fall under larger jurisdictions.

world2<-subset(world, select = c(admin, adm0_a3, sovereignt, continent))
glimpse(world2)

Rows: 242
Columns: 5
$ admin      <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ adm0_a3    <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", "UR…
$ sovereignt <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "Vat…
$ continent  <chr> "Africa", "Africa", "Asia", "Asia", "South America", "Europ…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGON (…

We now have a spatial dataframe with fewer variables. As mentioned, it seems that a good variable to merge on is adm0_a3. Let us look at both:

ctries_sp<-world2$adm0_a3
ctries_sp

  [1] "ZWE" "ZMB" "YEM" "VNM" "VEN" "VAT" "VUT" "UZB" "URY" "FSM" "MHL" "MNP"
 [13] "VIR" "GUM" "ASM" "PRI" "USA" "SGS" "IOT" "SHN" "PCN" "AIA" "FLK" "CYM"
 [25] "BMU" "VGB" "TCA" "MSR" "JEY" "GGY" "IMN" "GBR" "ARE" "UKR" "UGA" "TKM"
 [37] "TUR" "TUN" "TTO" "TON" "TGO" "TLS" "THA" "TZA" "TJK" "TWN" "SYR" "CHE"
 [49] "SWE" "SWZ" "SUR" "SDS" "SDN" "LKA" "ESP" "KOR" "ZAF" "SOM" "SOL" "SLB"
 [61] "SVK" "SVN" "SGP" "SLE" "SYC" "SRB" "SEN" "SAU" "STP" "SMR" "WSM" "VCT"
 [73] "LCA" "KNA" "RWA" "RUS" "ROU" "QAT" "PRT" "POL" "PHL" "PER" "PRY" "PNG"
 [85] "PAN" "PLW" "PAK" "OMN" "NOR" "PRK" "NGA" "NER" "NIC" "NZL" "NIU" "COK"
 [97] "NLD" "ABW" "CUW" "NPL" "NRU" "NAM" "MOZ" "MAR" "SAH" "MNE" "MNG" "MDA"
[109] "MCO" "MEX" "MUS" "MRT" "MLT" "MLI" "MDV" "MYS" "MWI" "MDG" "MKD" "LUX"
[121] "LTU" "LIE" "LBY" "LBR" "LSO" "LBN" "LVA" "LAO" "KGZ" "KWT" "KOS" "KIR"
[133] "KEN" "KAZ" "JOR" "JPN" "JAM" "ITA" "ISR" "PSX" "IRL" "IRQ" "IRN" "IDN"
[145] "IND" "ISL" "HUN" "HND" "HTI" "GUY" "GNB" "GIN" "GTM" "GRD" "GRC" "GHA"
[157] "DEU" "GEO" "GMB" "GAB" "FRA" "SPM" "WLF" "MAF" "BLM" "PYF" "NCL" "ATF"
[169] "ALD" "FIN" "FJI" "ETH" "EST" "ERI" "GNQ" "SLV" "EGY" "ECU" "DOM" "DMA"
[181] "DJI" "GRL" "FRO" "DNK" "CZE" "CYN" "CYP" "CUB" "HRV" "CIV" "CRI" "COD"
[193] "COG" "COM" "COL" "CHN" "MAC" "HKG" "CHL" "TCD" "CAF" "CPV" "CAN" "CMR"
[205] "KHM" "MMR" "BDI" "BFA" "BGR" "BRN" "BRA" "BWA" "BIH" "BOL" "BTN" "BEN"
[217] "BLZ" "BEL" "BLR" "BRB" "BGD" "BHR" "BHS" "AZE" "AUT" "AUS" "IOA" "HMD"
[229] "NFK" "ATC" "ARM" "ARG" "ATG" "AGO" "AND" "DZA" "ALB" "AFG" "KAS" "ATA"
[241] "SXM" "TUV"

ctries_sp<-world2$adm0_a3
glimpse(ctries_sp)

 chr [1:242] "ZWE" "ZMB" "YEM" "VNM" "VEN" "VAT" "VUT" "UZB" "URY" "FSM" ...

ctries_df<-clean_life_expectancy_df$Code
glimpse(ctries_df)

 chr [1:235] "AFG" "ALB" "DZA" "ASM" "AND" "AGO" "AIA" "ATG" "ARG" "ARM" ...

Let us now examine whether there are differences when it comes to what is included/excluded in the two lists. We can do this with the help of the setdiff function, which indicates which elements of a vector or data frame X are not existent in a vector or data frame Y.

#which elements in ctries_sp are not existent ctries_df?
countries_dif<-setdiff(ctries_sp, ctries_df)
#printing them
countries_dif

 [1] "SGS" "IOT" "PCN" "SDS" "SOL" "SAH" "KOS" "PSX" "BLM" "ATF" "ALD" "CYN"
[13] "IOA" "HMD" "NFK" "ATC" "KAS" "ATA"

These are countries that are in the spatial dataframe, but not in the our dataframe. What are these countries?

#Within the dataframe world2, we select the adm0_a3 and check 
#if it matches elements in countries_dif
world3<-subset(world2, adm0_a3 %in% countries_dif)
#We investigate the first 10 entries
head(world3, n = 10)

#We make a list for better viewing of those
list(world3$admin)[[1]]

 [1] "South Georgia and the Islands"       "British Indian Ocean Territory"     
 [3] "Pitcairn Islands"                    "South Sudan"                        
 [5] "Somaliland"                          "Western Sahara"                     
 [7] "Kosovo"                              "Palestine"                          
 [9] "Saint Barthelemy"                    "French Southern and Antarctic Lands"
[11] "Aland"                               "Northern Cyprus"                    
[13] "Indian Ocean Territories"            "Heard Island and McDonald Islands"  
[15] "Norfolk Island"                      "Ashmore and Cartier Islands"        
[17] "Siachen Glacier"                     "Antarctica"

Let us now perform a left merge so that we can have the data that we cleaned onto the spatial dataframe.

#Performing a left join
merged<-left_join(world2, clean_life_expectancy_df, by = c("adm0_a3" = "Code"))
#Investigating the first 4 entries.
glimpse(merged)

Rows: 242
Columns: 7
$ admin         <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ adm0_a3       <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", …
$ sovereignt    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ continent     <chr> "Africa", "Africa", "Asia", "Asia", "South America", "Eu…
$ Entity        <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ life_exp_mean <dbl> 54.24722, 51.56389, 51.64861, 65.73194, 65.33421, 75.331…
$ geometry      <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGO…

This looks good. Let us now make our first map.

ggplot() +
  geom_sf(data = merged, aes(fill = life_exp_mean))+
  ggtitle("Avg. Life Expectancy 1901-2021")

We have a just made a choropleth map, which indicates intensity of a variable. Let us now add a few embellishments to make our map nicer.

ggplot() +
    geom_sf(data = merged, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    ggtitle("Avg. Life Expectancy 1901-2021")

A map like this has exactly the same structure of a regular ggplot graph.

6 Performing a t-test

Let us now go back to our original task, that of calculating a t-test. Because we have this new variable - continent, we no longer need to manually ascribe a continent to our countries. Let us quickly see our labels for the continents.

unique(merged$continent)

[1] "Africa"                  "Asia"                   
[3] "South America"           "Europe"                 
[5] "Oceania"                 "North America"          
[7] "Seven seas (open ocean)" "Antarctica"

We can use the “South America” label. Note that this does not capture all the countries which we previously referred to as “Latin America.” For example, Mexico is not included here, as it is part of North America. This is to say that Latin America can be referred to as both a geographic and a linguistic space.

sample_latam<-subset(merged, continent=="South America")

Let us do the same for Europe

sample_eu<-subset(merged, continent=="Europe")

Let us quickly map South America.

ggplot() +
    geom_sf(data = sample_latam, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    ggtitle("Avg. Life Expectancy 1901-2021")

Let us do the same for Europe.

ggplot() +
    geom_sf(data = sample_eu, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    ggtitle("Avg. Life Expectancy 1901-2021")+
    coord_sf(
    xlim = c(-10, 40),   # longitude limits (west to east)
    ylim = c(35, 70)     # latitude limits (south to north)
  )

We are now ready to perform a t-test to see if the life expectancy in Latin America is different from life expectancy in Europe. Note that we have 50 observations in Europe and 13 in Latin America. Let us now perform a t-test.

test_greater<-t.test(sample_latam$life_exp_mean,
                     sample_eu$life_exp_mean,
        alternative = c("greater"))
test_greater


    Welch Two Sample t-test

data:  sample_latam$life_exp_mean and sample_eu$life_exp_mean
t = -5.5367, df = 17.108, p-value = 1
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -9.228658       Inf
sample estimates:
mean of x mean of y 
 63.25522  70.27813

What is the Alternative Hypothesis for a one-tailed t-test - ?

The Alternative Hypothesis: True difference in means is greater than 0 (mean of x − mean of y).
The average life expectancy in Latin America is greater than the average life expectancy in Europe.

What is the Null Hypothesis for this one-tailed t-test?

The Null Hypothesis: the average life expectancy in Latin America is less than or equal to the average life expectancy in Europe.

Do we accept or reject the Null Hypothesis?

We do not reject the null hypothesis (H0) because p ≥ 0.05 and conclude that there is no statistical evidence that life expectancy in Latin America is higher than in Europe.

A p-value of 1 means your data are completely inconsistent with your chosen direction of the test. There is no evidence at all that Latin America’s life expectancy is higher than that of Europe. So, we do not reject the H0 because p is much larger than 0.05.

But is it less?

test_less<-t.test(sample_latam$life_exp_mean,
                  sample_eu$life_exp_mean,
        alternative = c("less"))
test_less


    Welch Two Sample t-test

data:  sample_latam$life_exp_mean and sample_eu$life_exp_mean
t = -5.5367, df = 17.108, p-value = 1.771e-05
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -4.817149
sample estimates:
mean of x mean of y 
 63.25522  70.27813

What is the Alternative Hypothesis for a one-tailed t-test - ?

The Alternative Hypothesis: True difference in means is less than 0 (mean of x − mean of y).
The average life expectancy in Latin America is less than the average life expectancy in Europe.

What is the Null Hypothesis for this one-tailed t-test?

The Null Hypothesis: the average life expectancy in Latin America is greater than or equal to the average life expectancy in Europe.

Do we accept or reject the Null Hypothesis?

We reject the null hypothesis (H0) because p is less 0.05 and conclude that the average life expectancy in Latin America is significantly lower than in Europe.

Do Europe and Latin America have similar life expectancies?

test_twosided<-t.test(sample_latam$life_exp_mean,
                  sample_eu$life_exp_mean,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  sample_latam$life_exp_mean and sample_eu$life_exp_mean
t = -5.5367, df = 17.108, p-value = 3.541e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.697752 -4.348055
sample estimates:
mean of x mean of y 
 63.25522  70.27813

What is the Alternative Hypothesis for a two-tailed t-test - ?

The Alternative Hypothesis: The true difference in means is not equal to 0 (mean of Latin America − mean of Europe ≠ 0).
In other words, the average life expectancy in Latin America is different from the average life expectancy in Europe.

What is the Null Hypothesis for this two-tailed t-test?

The Null Hypothesis: The average life expectancy in Latin America is equal to the average life expectancy in Europe.

Do we accept or reject the Null Hypothesis?

Because the p-value (0.0000354) is less than 0.05, we reject the null hypothesis. There is strong statistical evidence that the average life expectancy in Latin America and Europe are significantly different.

Another good way to visualize the similarity between the two groups is by using a boxplot.

#Binding the Latin American and European Sample
box_dat<-rbind(sample_latam, sample_eu)

ggplot(box_dat, aes(x = continent, y = life_exp_mean, color = continent)) +
  geom_boxplot() +
  theme_bw()

And this is the interpretation of a box plot.

We can also add more of the observations to the boxplot by using geom_jitter, as shown below:

ggplot(box_dat, aes(x = continent, y = life_exp_mean, color = continent)) +
  geom_boxplot() +
  geom_jitter() +
  theme_bw()

7 Mapping Life Expectancy in Europe

What if we wanted to examine life expectancy in Europe over time? The first step is to fix the coordinates and add limits to our graph. This time around we need to choose a maximum Y (North Point), minimum Y (South Point), a maximum X (West Point), and minimum X (East Point). One good way would be to take these coordinates from the existing countries. Let us choose four countries that would allow us to decide on the coordinates for Europe: Norway, Portugal, Ukraine, and Greece.

#Step1: Selecting the country
norway<-subset(box_dat, Entity == "Norway")
#Step2: Extracting its coordinates
st_bbox(norway)

     xmin      ymin      xmax      ymax 
-9.098877 58.020947 33.629297 80.477832

For our map of Europe we are interested in the Northern most point in Norway. This will be our maximum Y.

max_lat_y<-st_bbox(norway)["ymax"]
max_lat_y

    ymax 
80.47783

Let us now get another country that is in the south: Greece.

#Step1: Selecting the country
greece<-subset(box_dat, Entity == "Greece")
#Step2: Extracting its coordinates
st_bbox(greece)

    xmin     ymin     xmax     ymax 
19.64648 34.93447 28.23184 41.74380

For our map of Europe we are interested in the Southern most point in Greece. This will be the minimum Y.

min_lat_y<-st_bbox(greece)["ymin"]
min_lat_y

    ymin 
34.93447

Let us now get a country in the East: Ukraine.

#Step1: Selecting the country
ukraine<-subset(box_dat, Entity == "Ukraine")
#Step2: Extracting its coordinates
st_bbox(ukraine)

    xmin     ymin     xmax     ymax 
22.13184 45.23413 40.12832 52.35356

For our map of Europe we are interested in the Eastern most point in Ukraine. This will be the maximum X.

max_lon_x<-st_bbox(ukraine)["xmax"]
max_lon_x

    xmax 
40.12832

Let’s choose now a country in the West: Portugal

#Step1: Selecting the country
portugal<-subset(box_dat, Entity == "Portugal")
#Step2: Extracting its coordinates
st_bbox(portugal)

     xmin      ymin      xmax      ymax 
-31.28296  32.64829  -6.21250  42.13740

For our map of Europe we are interested in the Western most point in Portugal. This will be the minimum X.

min_lon_x<-st_bbox(portugal)["xmin"]
min_lon_x

     xmin 
-31.28296

We are now ready:

figure1<-ggplot() +
    geom_sf() +
    geom_sf(data = box_dat, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x, max_lon_x), ylim = c(min_lat_y, max_lat_y))+
    ggtitle("Avg. Life Expectancy 1901-2021")

ggsave(figure1, file = "./graphs/figure1.jpg", 
       height = 20, width = 20, 
       units = "cm", dpi = 300)
figure1

Let us also visualize the points that we calculated. But to make things easier, we should create a dataframe.

euro_extreme<-data.frame(x_lon=c(min_lon_x, max_lon_x, min_lon_x, max_lon_x),
                         y_lat=c(min_lat_y, max_lat_y, max_lat_y, min_lat_y),
                         name = (c("South-West", "North-East", "North-West", "South-East")))
euro_extreme

Let us visualize the four points that we calculated.

figure1<-ggplot() +
    geom_sf() +
    geom_sf(data = box_dat, aes(fill = life_exp_mean))+
    geom_point(data= euro_extreme, aes(x=x_lon, 
                 y=y_lat), 
             fill="blue", color="darkred", size=3)+
  geom_label_repel(data= euro_extreme, aes(x=x_lon, y=y_lat,
    label = name), fill = alpha(c("white"),0.8))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x-3, max_lon_x+3), ylim = c(min_lat_y-3, max_lat_y+3))+
    ggtitle("Avg. Life Expectancy 1901-2021")


figure1

This looks pretty good. Next step is to redo the merge in our data. Remember that we calculated average life expectancy for the entire period: historical years up to the present day. Thus, we need to do a new merge in which we merge the spatial dataframe to the specific year on life expectancy.

#Making a more friendly label for life expectancy
names(life_expectancy_df)[4]<-"life_exp_mean"
#Selecting only 1950
life_expectancy_1950<-subset(life_expectancy_df, Year==1950)
#Merging countries (life_expectancy_1950) to the shapefile (world2)
world_1950<-left_join(world2, life_expectancy_1950, by = c("adm0_a3"="Code"))
#Examining the result
glimpse(world_1950)

Rows: 242
Columns: 8
$ admin         <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ adm0_a3       <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", …
$ sovereignt    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ continent     <chr> "Africa", "Africa", "Asia", "Asia", "South America", "Eu…
$ Entity        <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ Year          <int> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 19…
$ life_exp_mean <dbl> 49.6, 45.1, 31.7, 49.2, 51.1, 65.9, 44.5, 54.8, 65.6, 49…
$ geometry      <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGO…

What if we were interested in life expectancy in Europe in 1950?

Let us map life expectancy in 1950.

figure2<-ggplot() +
    geom_sf(data = world_1950, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x, max_lon_x), ylim = c(min_lat_y, max_lat_y))+
    ggtitle("Life Expectancy in 1950")

figure2

Let us map life expectancy in 2020.

#Selecting only 2020
life_expectancy_2020<-subset(life_expectancy_df, Year==2020)
#Merging countries (life_expectancy_2020) to the shapefile (world2)
world_2020<-left_join(world2, life_expectancy_2020, by = c("adm0_a3"="Code"))
#Examining the result
glimpse(world_2020)

Rows: 242
Columns: 8
$ admin         <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ adm0_a3       <chr> "ZWE", "ZMB", "YEM", "VNM", "VEN", "VAT", "VUT", "UZB", …
$ sovereignt    <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ continent     <chr> "Africa", "Africa", "Asia", "Asia", "South America", "Eu…
$ Entity        <chr> "Zimbabwe", "Zambia", "Yemen", "Vietnam", "Venezuela", "…
$ Year          <int> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 20…
$ life_exp_mean <dbl> 61.1, 62.4, 64.7, 75.4, 71.1, 83.0, 70.3, 70.3, 78.4, 70…
$ geometry      <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGO…

figure3<-ggplot() +
    geom_sf() +
    geom_sf(data = world_2020, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x, max_lon_x), 
             ylim = c(min_lat_y, max_lat_y))+
      ggtitle("Life Expectancy in 2020")
figure3

Let us put them side by side.

grid.arrange(figure2, figure3, ncol=2)

We don’t see much of a difference because we also need to fix the color scheme in a way that has the same minimum and maximum. This is how we do this. We first identify the minimum and maximum value in the variable that we want to plot.

vmax <- max(life_expectancy_df$life_exp_mean, na.rm=T)
vmax

[1] 86.5

vmin <- min(life_expectancy_df$life_exp_mean, na.rm=T)
vmin

[1] 12

Let us now replot the maps

library(ggpubr)

figure2<-ggplot() +
    geom_sf() +
    geom_sf(data = world_1950, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x, max_lon_x), ylim = c(min_lat_y, max_lat_y))+
    scale_fill_gradient(limits = c(vmin, vmax), name = "Life Expectancy")+
    ggtitle("Life Expectancy in 1950")


figure3<-ggplot() +
    geom_sf() +
    geom_sf(data = world_2020, aes(fill = life_exp_mean))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x, max_lon_x),
             ylim = c(min_lat_y, max_lat_y))+
    scale_fill_gradient(limits = c(vmin, vmax), name = "Life Expectancy")+
    ggtitle("Life Expectancy in 2020")


ggarrange(figure2, figure3, ncol=2, common.legend = TRUE)

Now, the differences are more visible. What if we were interested in plotting the change in life expectancy between 1950 and 2020?

#Step1: Selecting only 1950 and 2020
life_expectancy_1950_2020<-subset(life_expectancy_df, Year %in% c(1950, 2020))
#Step2: Calculating the difference betweem the 2020 (life_expectancy_values) and 1950 (lag(life_expectancy_values)) by country
life_expectancy_1950_2020b<-life_expectancy_1950_2020%>%
  dplyr::arrange(Code, Year) %>%
  dplyr::group_by(Code) %>%
  dplyr::summarise(differ=life_exp_mean - first(life_exp_mean))
#Step3: Removing countries with NA code and where is the difference calculated is NA
life_expectancy_1950_2020c<-subset(life_expectancy_1950_2020b, Code!="" & !is.na(differ))

#Step4: Examining the result
glimpse(life_expectancy_1950_2020c)

Rows: 474
Columns: 2
Groups: Code [237]
$ Code   <chr> "ABW", "ABW", "AFG", "AFG", "AGO", "AGO", "AIA", "AIA", "ALB", …
$ differ <dbl> 0.0, 18.5, 0.0, 34.9, 0.0, 26.0, 0.0, 21.6, 0.0, 32.3, 0.0, 14.…

Let us now plot the change.

#Performing a left join
world_change<-left_join(world2, life_expectancy_1950_2020c, by = c("adm0_a3"="Code"))
#Investigating the result
glimpse(world_change)

Rows: 466
Columns: 6
$ admin      <chr> "Zimbabwe", "Zimbabwe", "Zambia", "Zambia", "Yemen", "Yemen…
$ adm0_a3    <chr> "ZWE", "ZWE", "ZMB", "ZMB", "YEM", "YEM", "VNM", "VNM", "VE…
$ sovereignt <chr> "Zimbabwe", "Zimbabwe", "Zambia", "Zambia", "Yemen", "Yemen…
$ continent  <chr> "Africa", "Africa", "Africa", "Africa", "Asia", "Asia", "As…
$ differ     <dbl> 0.0, 11.5, 0.0, 17.3, 0.0, 33.0, 0.0, 26.2, 0.0, 20.0, 0.0,…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((31.28789 -2..., MULTIPOLYGON (…

figure4<-ggplot() +
    geom_sf() +
    geom_sf(data = world_change, aes(fill = differ))+
    theme_bw()+
    xlab("X - Longitude") + ylab("Y - Latitude")+
    coord_sf(xlim = c(min_lon_x, max_lon_x), ylim = c(min_lat_y, max_lat_y))+
  ggtitle("Life Expectancy Difference: 2020-1950")

figure4

What is the country in Europe with the highest change in life expectancy between 1950 and 2020?

#Step1: Selecting Europe
world_change_res<-subset(world_change, continent=="Europe")
#Step2: Selecting the maximum value for differ (difference between 2020 and 1950)
world_change_resx<-subset(world_change_res, differ==max(world_change_res$differ, na.rm=T))
#Step3: Identifying the country
world_change_resx$admin

[1] "Albania"

What is the country in Europe with the lowest change in life expectancy between 1950 and 2020?

#Step2: Selecting the minimum value for differ (difference between 2020 and 1950)
world_change_resx<-subset(world_change_res, differ==min(world_change_res$differ, na.rm=T))
#Step3: Identifying the country
world_change_resx$admin

 [1] "Vatican"                "Jersey"                 "Guernsey"              
 [4] "Isle of Man"            "United Kingdom"         "Ukraine"               
 [7] "Switzerland"            "Sweden"                 "Spain"                 
[10] "Slovakia"               "Slovenia"               "Republic of Serbia"    
[13] "San Marino"             "Russia"                 "Romania"               
[16] "Portugal"               "Poland"                 "Norway"                
[19] "Netherlands"            "Montenegro"             "Moldova"               
[22] "Monaco"                 "Malta"                  "North Macedonia"       
[25] "Luxembourg"             "Lithuania"              "Liechtenstein"         
[28] "Latvia"                 "Italy"                  "Ireland"               
[31] "Iceland"                "Hungary"                "Greece"                
[34] "Germany"                "France"                 "Finland"               
[37] "Estonia"                "Faroe Islands"          "Denmark"               
[40] "Czechia"                "Croatia"                "Bulgaria"              
[43] "Bosnia and Herzegovina" "Belgium"                "Belarus"               
[46] "Austria"                "Andorra"                "Albania"

8 Correlation between two Variables

Let us now calculate the correlation between urbanization and life expectancy.

#Providing more friendly names
names(urbanization_df)[4]<-"urb_mean"

#Selecting only relevant variables
urbanization_df2<-subset(urbanization_df, select=c(Code, urb_mean, Year))

#Performing a left join
merged_df<-left_join(life_expectancy_df, urbanization_df2, by = c("Code"="Code", "Year"="Year"))
#Removing NA values
merged_df2<-na.omit(merged_df)

What is the correlation in life expectancy and urbanization for the entire dataset?

#Calculating the correlation
cor(merged_df2$life_exp_mean, merged_df2$urb_mean)

[1] 0.4579985