Introduction

Life expectancy is a measure of the average lifespan of a population. It is influenced by a variety of factors, including genetics, lifestyle choices, and environmental conditions. In this memo, we will explore some of the key drivers of life expectancy and their impact on overall health and longevity.

One of the major factors that influence life expectancy is access to quality healthcare. This includes access to preventative care, such as regular check-ups and screenings, as well as access to advanced medical treatments and technologies. For example, countries with well-developed healthcare systems and high levels of investment in medical research tend to have higher life expectancies. This is because they are able to provide early diagnoses, effective treatments, and improved quality of life for individuals with health conditions.

Another important driver of life expectancy is education and income. Education has been shown to have a direct impact on health outcomes, as individuals who are well-informed and knowledgeable about their health are more likely to make healthy lifestyle choices. Additionally, higher levels of income are correlated with better access to healthcare, nutritious food, and safe living conditions, all of which contribute to overall health and longevity.

#Loading relevant Libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Cleaning and Analysis

Data reading

We first read the relevant datasets from our hardrive.

setwd("/Users/bgpopescu/Dropbox/john_cabot/teaching/stats/memo")
life_expectancy<-read.csv("./life-expectancy.csv")
literacy_rates<-read.csv("./cross-country-literacy-rates.csv")
electoral_democracy<-read.csv("./electoral-democracy.csv")
gdp<-read.csv("./gdp-per-capita-maddison-2020.csv")
hospital_beds<-read.csv("./number-of-hospital-beds.csv")
urban<-read.csv("./share-of-population-urban.csv")
pop<-read.csv("./UN-population-projection-medium-variant.csv")

Renaming variables

We will now rename the variables of interest.

names(life_expectancy)[4]<-"life_expectancy"
names(literacy_rates)[4]<-"literacy"
names(gdp)[4]<-"gdp"
names(hospital_beds)[4]<-"beds"
names(urban)[4]<-"urbanization"
names(pop)[4]<-"population"

Selecting literacy

literacy_rates2<-subset(literacy_rates, select = c("Code", "Year", "literacy"))
literacy_rates2<-subset(literacy_rates2, !is.na(literacy_rates2$Code))

Selecting gdp

gdp2<-subset(gdp, select = c("Code", "Year", "gdp"))
gdp2<-subset(gdp2, !is.na(gdp2$Code))

Selecting hospital beds

hospital_beds2<-subset(hospital_beds, select = c("Code", "Year", "beds"))
hospital_beds2<-subset(hospital_beds2, !is.na(hospital_beds2$Code))
names(electoral_democracy)
## [1] "Entity"                  "Code"                   
## [3] "Year"                    "electdem_vdem_owid"     
## [5] "electdem_vdem_high_owid" "electdem_vdem_low_owid"

Selecting democracy

electoral_democracy2<-subset(electoral_democracy, select = c("Code", "Year", "electdem_vdem_owid"))
electoral_democracy2<-subset(electoral_democracy2, !is.na(electoral_democracy2$Code))

Selecting pop

pop2<-subset(pop, select = c("Code", "Year", "population"))
pop2<-subset(pop2, !is.na(pop2$Code))

Performing the left join

urban2<-subset(urban, select = c("Code", "Year", "urbanization"))
urban2<-subset(urban2, !is.na(urban2$Code))
new2<-left_join(life_expectancy, electoral_democracy2, by=c('Code'='Code', 'Year'='Year'))
## Warning in left_join(life_expectancy, electoral_democracy2, by = c(Code = "Code", : Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 73 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
new3<-left_join(new2, gdp2, by=c('Code'='Code', 'Year'='Year'))
## Warning in left_join(new2, gdp2, by = c(Code = "Code", Year = "Year")): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 79 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
new4<-left_join(new3, urban2, by=c('Code'='Code', 'Year'='Year'))
## Warning in left_join(new3, urban2, by = c(Code = "Code", Year = "Year")): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 205 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
new5<-left_join(new4, pop2, by=c('Code'='Code', 'Year'='Year'))
## Warning in left_join(new4, pop2, by = c(Code = "Code", Year = "Year")): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 79 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

Excluding irrelevant countries

exclusion_list<-c("Western Sahara", "Wallis and Futuna", "Upper-middle-income countries", "United States Virgin Islands", "Turks and Caicos Islands",
"Tokelau",
"Small Island Developing States (SIDS)",
"Sint Maarten (Dutch part)",
"Sao Tome and Principe",
"Saint Martin (French part)",
"Saint Barthlemy",
"More developed regions",
"Montserrat",
"Mayotte",
"Lower-middle-income countries",
"Low-income countries",
"Less developed regions, excluding least developed countries",
"Less developed regions, excluding China",
"Less developed regions",
"Least developed countries",
"Latin America and the Caribbean",
"Land-locked Developing Countries (LLDC)",
"High-income countries",
"Guernsey",
"Bonaire Sint Eustatius and Saba",
"Africa",
"Europe",
"Americas",
"Asia",
"Northern America",
"Oceania")
new6<-subset(new5, !(new5$Entity %in% exclusion_list))
#rm(new5, new6)

Focusing on relevent years

new7<-subset(new6, Year> 1905)
names(new7)
## [1] "Entity"             "Code"               "Year"              
## [4] "life_expectancy"    "electdem_vdem_owid" "gdp"               
## [7] "urbanization"       "population"
list_ctries<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia", "Denmark", "Estonia", 
"Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia",
"Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal",
"Romania", "Slovakia", "Slovenia", "Spain", "Sweden")
new7$continent<-NA
new7$continent[new7$Entity %in% list_ctries]<-"EU"
sample<-subset(new7, continent=="EU")

Interpretions

If we focus on Europe, we notice that there has been a substantial increase in life expectancy. While prior to 1950, the average life expectancy was 55.17. After the 1950s, the average life expectancy was 73.37 years. The following graph shows the evolution of life expectancy for a variety of countries starting with 1905 and finishing off with the present day.

ggplot(data = sample, aes(x=Year, y=life_expectancy, color=Entity))+
    geom_line()+
    geom_hline(yintercept = 0) +
  labs(x = "Year", y = "Life Expectancy")+
  theme_bw() +
  theme(axis.text.x = element_text(size=12),
        axis.text.x.top  = element_text(size=10, angle = 45, margin=margin(5,5,10,5,"pt")),
        axis.text.y = element_text(size=12),
        axis.title=element_text(size=12),
        plot.title = element_text(hjust = 0.5))