Previously, we covered quite some ground:
Before we use ggplot, our data has to be tidy
This means:
People are better at seeing height differences than angle and area differences
People are better at seeing height differences than angle and area differences.
This how to obtain the same plots.
People are better at seeing height differences than angle and area differences
This how to obtain the same plots.
For example, let’s say we want to compare life expectancy in Latin America with EU
#Setting path
library("dplyr")
setwd("/Users/bgpopescu/Dropbox/john_cabot/teaching/big_data/week4/data/")
#Step1: Loading the data
life_expectancy <- read.csv(file = './life-expectancy.csv')
urbanization <- read.csv(file = './share-of-population-urban.csv')
#Step2: Removing countries with no 3-letter code
life_expectancy2<-subset(life_expectancy, Code!="")
#Step3: Changing variable name
names(life_expectancy2)[names(life_expectancy2)=="Life.expectancy.at.birth..historical."]<-"life_exp"
#Step4: Selecting only vars of interest
life_expectancy3<-subset(life_expectancy2, selec=c("Entity", "Code", "Year", "life_exp"))
#Step5: Removing countries with no 3-letter code
urbanization2<-subset(urbanization, Code!="")
#Step6: Changing variable name
names(urbanization2)[names(urbanization2)=="Urban.population....of.total.population."]<-"urb"
#Step7: Selecting only vars of interest
urbanization3<-subset(urbanization2, selec=c("Code", "Year", "urb"))
#Step8: Performing a merge
final<-left_join(life_expectancy3, urbanization3, by = c("Code"="Code",
"Year"="Year"))
#Step9: Removing NAs
final2<-final[complete.cases(final), ]
#Step10: Defining continents
#EU Countries
eu_countries<-c("Austria",
"Belgium",
"Bulgaria",
"Croatia",
"Cyprus",
"Czechia",
"Denmark",
"Estonia",
"Finland",
"France",
"Germany",
"Greece",
"Hungary",
"Ireland",
"Italy",
"Latvia",
"Lithuania",
"Luxembourg",
"Malta",
"Netherlands",
"Poland",
"Portugal",
"Romania",
"Slovakia",
"Slovenia",
"Spain",
"Sweden")
latam_countries<-c("Belize",
"Costa Rica",
"El Salvador",
"Guatemala",
"Honduras",
"Mexico",
"Nicaragua",
"Panama",
"Argentina",
"Bolivia",
"Brazil",
"Chile",
"Colombia",
"Ecuador",
"Guyana",
"Paraguay",
"Peru",
"Suriname",
"Uruguay",
"Venezuela",
"Cuba",
"Dominican Republic",
"Haiti")
#Step11: Labeling continents
final2$continent[final2$Entity %in% eu_countries]<-"EU"
final2$continent[final2$Entity %in% latam_countries]<-"Latin America"
final2$continent[is.na(final2$continent)]<-"Everything Else"
For example, let’s say we want to compare life expectancy in Latin America with EU
For example, let’s say we want to compare life expectancy in Latin America, EU, and the Rest of the World
A more compelling way is: boxplots with points