Upon successful completion of this course the students will be able to:
Data Scientist/Data Analyst
GIS Analyst/GIS Specialist
Environmental Scientist
Market Research Analyst
Remote Sensing Specialist
Transportation Planner
You will be graded on four problem sets during the semester (each 12.5% of your grade) and a final report and presentation (each 25% of your grade).
You will undertake a GIS project that emphasizes practical application of spatial analysis techniques using the R programming language
The project entails a few steps:
In the first part of the course, we will learn about the R programming language and its capabilities with respect to spatial data
The first lectures will be dedicated to acquire the basic knowledge to work with spatial data bit also with R
We will then move to work with spatial data in R, including how to process: vectors, rasters, and combine the two
In the final part, we will also learn how to deal with spatio-temporal data and point pattern analysis
R is a programming language originally designed for statistical computing
It is an open-source ecosystem (i.e. everyone can contribute and it’s free)
It is compatible with Windows, Mac, and Linux
A variety of libraries already exist which allow you to do easy things like:
This is how R compares to other programming languages
R is used in a variety of fields:
Examples of companies which use R include
GIS stands for Geographic Information Systems
GIS is a system that that creates, manages, analyzes, and maps all types of data
It helps us understand patterns, relationships, and geographic context
It can be used to:
Mapping focuses on the visual representation of data
Spatial analysis focuses on a variety of aspects:
GIS comprises both mapping (visualization) and geographic data manipulations and analysis
GIS Software
#Step1: Data Cleaning
clean_countries<-subset(life_expectancy2, !(Code %in% weird_labels))
clean_countries_urbanization<-subset(urbanization2, !(Code %in% weird_labels))
#Step2: Further Data Cleaning
clean_countries<-subset(life_expectancy2, !(Code %in% weird_labels))
clean_countries_urbanization<-subset(urbanization2, !(Code %in% weird_labels))
#Step3: Left Join
new_data<-left_join(clean_countries, clean_countries_urbanization, by = c("Code"="Code"))
#Step1: Data Cleaning
clean_countries<-subset(life_expectancy2, !(Code %in% weird_labels))
clean_countries_urbanization<-subset(urbanization2, !(Code %in% weird_labels))
#Step2: Further Data Cleaning
clean_countries<-subset(life_expectancy2, !(Code %in% weird_labels))
clean_countries_urbanization<-subset(urbanization2, !(Code %in% weird_labels))
#Step3: Left Join
new_data<-left_join(clean_countries, clean_countries_urbanization, by = c("Code"="Code"))
R is good for:
Reading and writing spatial data into R is done through external libraries
sf
sf
will be the main library that we will work with
It will help us deal with:
sf
: bufferstars
We can perform geometric operation on rasters (pictures) with the stars
package
stars
Temperature in 1901
stars
Temperature in 2022
stars
Temperature difference between 2022 and 1901 > 4
ggplot2
is the library that will allow to visualize data analysis results, but also to make mapsleaflet
is a library that allows us to make interactive mapsmapview
is a wrapper around leaflet
automating the addition of: labels, popups, color scales, and common basemapsA programming language is a machine-readable artificial language designed to express computations that can be performed by a computer.
Programming allows us to edit code and re-use it in the future and obtain the same results in the future
In object-oriented programming, the interaction with the computer takes place though objects
Each object belongs to a class: an abstract structure that has specific properties
Example:
All cars in the parking lot are instances of the “car” class
The “car” class has specific properties: make, color, year and methods: start, drive stop
We will see that everything that we work with in R is an object
For example, we can load up a geojson
file in R.
#Step1: Loading the geojson file
library(sf)
restaurants <- read_sf("/Users/bgpopescu/Dropbox/john_cabot/teaching/big_data/week7/data/restaurant.geojson")
#Step2: Selecting only the relevant variables
restaurants<-subset(restaurants, select = c(name, `addr:street`))
#Step3: Removing the restaurants without a name or without an address
restaurants2<-subset(restaurants, !is.na(restaurants$name) | !is.na(restaurants$`addr:street`))
R transforms the geojson file into an object of a class named sf data.frame
This type of object has numerous properties such as:
Once imported, the sf data.frame
is saved in the computer memory
Printing the object will display some of its properties and specific properties
Simple feature collection with 2811 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 12.21167 ymin: 41.70574 xmax: 12.77428 ymax: 42.06974
Geodetic CRS: WGS 84
# A tibble: 2,811 × 3
name `addr:street` geometry
<chr> <chr> <POINT [°]>
1 Pizzeria ai Marmi Viale di Trastevere (12.47379 41.88826)
2 Sichuan Haozi Via di San Martino a… (12.49948 41.8958)
3 Dar filettaro a Santa Barbara Largo dei Librari (12.4737 41.89467)
4 Al Peperoncino Via Ostiense (12.47698 41.85343)
5 Ai Tre Scalini Via Panisperna (12.49044 41.89628)
6 Trattoria Ada e Mario Circonvallazione App… (12.51433 41.87532)
7 Gustosando <NA> (12.42743 41.89954)
8 Sa Posada Via Elvia Recina (12.5079 41.87995)
9 Pizzeria Formula 1 Via degli Equi (12.51268 41.89702)
10 Da Francesco Piazza del Fico (12.4704 41.89932)
# ℹ 2,801 more rows
By printing the object, we can see some of its properties including:
One of the characteristics of object oriented programming is inheritance
Inheritance is what makes it possible for one class to extend to another class, by adding other properties
Example:
A “taxi” is an extension of a “car” class, inheriting all of its properties and methods.
A taxi could have new properties like taxi company name
In R, every complex object is a collection of smaller components such a properties
We can use str
to examine the properties of the class
sf [2,811 × 3] (S3: sf/tbl_df/tbl/data.frame)
$ name : chr [1:2811] "Pizzeria ai Marmi" "Sichuan Haozi" "Dar filettaro a Santa Barbara" "Al Peperoncino" ...
$ addr:street: chr [1:2811] "Viale di Trastevere" "Via di San Martino ai Monti" "Largo dei Librari" "Via Ostiense" ...
$ geometry :sfc_POINT of length 2811; first list element: 'XY' num [1:2] 12.5 41.9
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr [1:2] "name" "addr:street"
For example, the names of the restaurants are stored as a string variable called name
(second line of output)
The addresses of the restaurants are stored as a string variable called addr:street
(second line of output)
We will now familiarize ourselves with the R environment
We first need to install R: R-project
We will then install an R interface that allows us to interact with R in a more user-friendly manner: R-studio
Popescu (JCU): Lecture 1