Spatial Joins and Centroids
The main goals for the include:
We will again make use to the sf
package:
We will work with wildfires data to demonstrate the principles of join by location
Spain is one of the countries worst affected by wildfires in recent times.
We try to match provinces with fires.
The latest year for which data is available is 2021.
To obtain this, we need to collect some data
First, go to https://gadm.org and download the shapefiles for Spain
Second, obtain wildfire data from the NASA website: https://firms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
irms.modaps.eosdis.nasa.gov
After you submit your request, you will get an automatic email.
It will tell you that the request is being processed.
At some point, you will get a notification that the your request has been processed.
You should go though all the above mentioned steps if you’re interested in this topic for your final project.
irms.modaps.eosdis.nasa.gov
To save time, you can download the wildfires for Spain in 2021
Also, refer to the the model research project for this class
Reminder: we want to obtain the following figure
We will first process the shapefile for Spain.
Notice that Spain’s territory is larger than the common territory that we associate with Spain.
Here is how we select the largest polygon associated with Spain
#Step1: Loading libraries
library(sf)
library(ggplot2)
#Step2: Turn "s2" off to enable polygon simplification
sf_use_s2(FALSE)
#Step3: Reading Spain borders
esp0 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_0.shp", quiet = TRUE)
esp0<-st_simplify(esp0, dTolerance=0.01)
#Note: Spain is made out of the polygons
#We will focus on the largest polygons
#that are traditionally association with Spain
#Step4: Turning multipolygons into polygons
esp0b = st_cast(esp0,"POLYGON")
#Step5: Calculating polygon area
esp0b$area<-st_area(esp0b)
#Step6: Turning the units into square km
esp0b$area2<-units::set_units(esp0b$area, km^2)
#Step7: Only choosing the largest polygon
esp0c<-subset(esp0b, area2==max(esp0b$area2))
#Step8: Making a box object around the polygon
bbox <- st_bbox(esp0c)
#Step9: Extracting min and max latitudes and longitudes
min_lat_y <- bbox["ymin"]
max_lat_y <- bbox["ymax"]
min_lon_x <- bbox["xmin"]
max_lon_x <- bbox["xmax"]
Here is how esp0
differs from esp0c
We will now process the fires data
#Step1: Reading wildfires
fires <- read.csv('./data/fires_2021/viirs-snpp_2021_Spain.csv')
#Step2: Turning the relevant time variable into a date variable recognized by R
fires$acq_date<-as.Date(fires$acq_date)
#Step3: Turning the CSV file into an SF object
fires_sf<-st_as_sf(fires, coords = c("longitude", "latitude"))
#Step4: Associating the crs information for the Spain shapefile to points
fires_sf <- st_set_crs(fires_sf, st_crs(esp0)) # 4326 is the EPSG code for WGS84
#Step5: Only selecting fires of type 0
#See the variable definitions:
#0 = presumed vegetation fire
#1 = active volcano
#2 = other static land source
#3 = offshore detection (includes all detections over water)
fires_sf<-subset(fires_sf, type==0)
#Step6: Extract month information
fires_sf$month <- format(fires_sf$acq_date, "%m")
#Step7: Reading only August
fires_aug<-subset(fires_sf, acq_date>"2021-07-31" & acq_date<"2021-09-01")
To learn about what the different variables associated with the fires, visit: https://www.earthdata.nasa.gov/learn/find-data/near-real-time/firms/vnp14imgtdlnrt#ed-viirs-375m-attributes
We can finally make the map:
#Defining an error for better map extent
error<-0.05
ggplot() +
geom_sf(data = esp0, color = "black", fill="black") +
geom_sf(data = fires_aug, aes(color = bright_ti4, size = bright_ti4), alpha = 0.7) +
scale_color_viridis_c(option = "A", name = "bright_ti4")+
scale_size_continuous(range = c(0.5, 2),
name = "bright_ti4") +
#We still need to define the extent: fires also happen outside the large polygon
#associated with Spain
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error)) +
ggtitle("Wild Fire Intensity in Spain, August '21") +
guides(color = guide_legend(title.position = "top"),
size = guide_legend(title.position = "top"))
We can finally make the map:
We can also produce the following map
This is done using facet_wrap
ggplot() +
geom_sf(data = esp0, color = "black", fill="black") +
geom_sf(data = fires_sf, aes(color = bright_ti4, size = bright_ti4), alpha = 0.7) +
scale_color_viridis_c(option = "A", name = "bright_ti4")+
scale_size_continuous(range = c(0.5, 2),
name = "bright_ti4") +
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error)) +
ggtitle("Wild Fire Intensity in Spain, August '21") +
guides(color = guide_legend(title.position = "top"),
size = guide_legend(title.position = "top"))+
facet_wrap(~month, ncol = 4) # Adjust the number of columns as needed
This is done using facet_wrap
ggplot() +
geom_sf(data = esp0, color = "black", fill="black") +
geom_sf(data = fires_sf, aes(color = bright_ti4, size = bright_ti4), alpha = 0.7) +
scale_color_viridis_c(option = "A", name = "bright_ti4")+
scale_size_continuous(range = c(0.5, 2),
name = "bright_ti4") +
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error)) +
ggtitle("Wild Fire Intensity in Spain, August '21") +
guides(color = guide_legend(title.position = "top"),
size = guide_legend(title.position = "top"))+
facet_wrap(~month, ncol = 4) # Adjust the number of columns as needed
Or we can produce an animated map:
Or we can produce an animated map:
library("gganimate")
#Extracting coordinates
fires_sf$lat_x<-st_coordinates(fires_sf)[,"X"]
fires_sf$lon_y<-st_coordinates(fires_sf)[,"Y"]
#Dropping the geometry
fires<-st_drop_geometry(fires_sf)
#Mapping
fig<-ggplot() +
geom_sf(data = esp0, color = "black", fill="black")+
#We need to use geom_point NOT geom_sf because of library compatibility problems
geom_point(data = fires, aes(x= lat_x, y = lon_y,
color = bright_ti4, size = bright_ti4), alpha = 0.9)+
#Note that animated geom_sf with points does not work because of some library incompatibility
#geom_sf(data = fires_sf, aes(color = bright_ti4, size = bright_ti4), alpha = 0.7) +
scale_color_viridis_c(option = "A", name = "bright_ti4")+
scale_size_continuous(range = c(0.1, 4),
name = "bright_ti4") +
guides(color = guide_legend(title = "bright_ti4"),
size = guide_legend(title = "bright_ti4"))+
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error))+
#animate arguments
transition_time(acq_date) +
ggtitle('Date: {frame_time}')
#defining animation options: speed of transition and resolution
animate(fig, fps=2, res = 60)
spatial join
is one of the most common operations in spatial analysis.Place the folder with the shape files in the relevant week folder
Do the same for wildfires
This is how we read the shapefiles for Spain
sf_use_s2(FALSE)
library(sf)
library(ggplot2)
#Reading Spain borders
esp0 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_0.shp", quiet = TRUE)
esp0<-st_simplify(esp0, dTolerance=0.5)
esp2 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_2.shp", quiet = TRUE)
esp2<-st_simplify(esp2, dTolerance=0.01)
esp3 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_3.shp", quiet = TRUE)
esp3<-st_simplify(esp3, dTolerance=0.5)
And this is how we read the wildfires:
#Step1: Reading wildfires
fires <- read.csv('./data/fires_2021/viirs-snpp_2021_Spain.csv')
#Step2: Turning the acq_date variable into a date variable
fires$acq_date<-as.Date(fires$acq_date)
#Step3: Make sf object from points
fires<-st_as_sf(fires, coords = c("longitude", "latitude"))
fires <- st_set_crs(fires, st_crs(esp0)) # 4326 is the EPSG code for WGS84
#Step4: Extract month and year information
fires$month <- format(fires$acq_date, "%m")
fires$year <- format(fires$acq_date, "%Y")
#Step5: Simplifying our data
fires<-subset(fires, type==0)
#Step6: Reading only August
fires_aug<-subset(fires, acq_date>"2021-07-31" & acq_date<"2021-09-01")
We will try to count the number of fires by province
And this is how it differs from the original data: note the many overlapping points
Jittering the points allows us to see better the ones that are clustered
error<-0.05
ggplot() +
geom_sf(data = esp2b) +
#low jitter
geom_sf(data = st_jitter(fires_augb, 0.05),
size=0.05, color = "red", alpha = 0.7) +
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error)) +
ggtitle("Wild Fire Intensity in Spain, August '21") +
guides(
color = guide_legend(title.position = "top"),
size = guide_legend(title.position = "top")
)
error<-0.05
ggplot() +
geom_sf(data = esp2b) +
#high jitter
geom_sf(data = st_jitter(fires_augb, 0.5),
size=0.05, color = "red", alpha = 0.7) +
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error)) +
ggtitle("Wild Fire Intensity in Spain, August '21") +
guides(
color = guide_legend(title.position = "top"),
size = guide_legend(title.position = "top")
)
And this is how the original data compares to the counts
And this is how we produce this graph:
Step 1:
library(dplyr)
library(gridExtra)
#Step1: Selecting only one variable
esp2b<-subset(esp2, select = c("NAME_2"))
#Step2: Selecting only bright_ti5.
#This represents I-5 Channel brightness temperature of the fire pixel measured in Kelvin.
fires_augb<-subset(fires_aug, select = c("bright_ti5"))
#Step3: Adding a variable fire_no = 1 to help count fires
fires_augb$fire_no<-1
#Step4: Performing the spatial join
xjoin<-st_join(fires_augb, esp2b)
Let us have a look at what xtjoin
looks like
xtjoin
- Step 1xtjoin
- Step 1Question
What does every row in xtjoin
represent?
In other words, what is the unit of analysis?
Answer
The unit is fire (i.e. the invidual event). The unit is NOT the district.
This is why Avila appears over 1,000 times
Thus, we need to count how many fires we have per district.
#Step4: Dropping the geometry from the object (i.e. turning it into a regular dataframe)
#We need to drop the geometry to aggregate by group
xjoin<-st_drop_geometry(xjoin)
#Step5: Summing by polygons
sum_per_province <- xjoin %>%
group_by(NAME_2) %>%
summarize(sum_fires = sum(fire_no, na.rm = TRUE))
#Step6: Performing a left join to the shapefiles
esp1c<-left_join(esp2b, sum_per_province, by = c("NAME_2"="NAME_2"))
#Step7: Replacing NAs with 0
esp1c$sum_fires[is.na(esp1c$sum_fires)]<-0
#Step8: Calculating logs to eliminate extreme values
esp1c$log_fires<-log(esp1c$sum_fires+1)
And this is how we produce this graph:
Step 2:
#Step9: Calculating SD and Mean
fires_sd <- sd(esp1c$log_fires, na.rm=T)
fires_mean <- mean(esp1c$log_fires, na.rm=T)
#Step10: Creating different steps
step1<-min(esp1c$log_fires, na.rm=T)
step2<-fires_mean - fires_sd
step3<-fires_mean - 0.5*fires_sd
step4<-fires_mean
step5<-fires_mean + 0.5*fires_sd
step6<-fires_mean + fires_sd
step7<-max(esp1c$log_fires, na.rm=T)
#Step11: Putting all the steps in a list
clint <- c(step1,
step2,
step3,
step4,
step5,
step6,
step7)
clint
[1] 0.0000000 0.8281742 1.5397413 2.2513085 2.9628756 3.6744428 7.2063773
And this is how we produce this graph:
Step 3:
#Step1: Reading Spain borders
sf_use_s2(FALSE)
esp0 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_0.shp", quiet = TRUE)
esp0<-st_simplify(esp0, dTolerance=0.05)
#Step2: Simplifying country polygon
esp0b = st_cast(esp0,"POLYGON")
#Step3: Calculating polygon area
esp0b$area<-st_area(esp0b)
#Step4: Turning the units into square km
esp0b$area2<-units::set_units(esp0b$area, km^2)
#Step5: Only choosing the largest polygon
esp0c<-subset(esp0b, area2==max(esp0b$area2))
#Step6: Making a box object around the polygon
bbox <- st_bbox(esp0c)
#Step7: Extracting min and max latitudes and longitudes
min_lat_y <- bbox["ymin"]
max_lat_y <- bbox["ymax"]
min_lon_x <- bbox["xmin"]
max_lon_x <- bbox["xmax"]
And this is how we produce this graph:
Step 4:
#The Map
p1<-ggplot(esp1c, aes(fill=log_fires)) +
geom_sf() +
theme_void() +
scale_fill_viridis_c(option = "viridis",
breaks = clint,
guide = guide_coloursteps(even.steps = FALSE,
title = NULL,
barheight = unit(5.3, "cm"),
barwidth = unit(0.64, "cm"),
label.position = "left"))+
geom_sf_text(aes(label = NAME_2), colour = "white", size=1.5)+
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error))
p1
And this is how we produce this graph:
Step 5:
#The Histogram
p2<- ggplot(esp1c, aes(log_fires)) +
geom_histogram(aes(y = after_stat(density)), bins = 10,
fill = "grey70",
col = "white")+
coord_flip() +
geom_density(colour = "blue")+
scale_x_continuous(breaks = clint,
labels = c("Min", "-1SD" , "-0.5*SD", "Mean", "0.5*SD", "1SD", "Max"),
position = "top", limits = range(clint)) +
scale_y_continuous(labels = NULL, breaks = NULL) +
ylab(NULL) +xlab(NULL) +
theme(plot.margin = margin(0.1,0,0,0.1, "in"),
axis.text = element_text(colour = "grey"),
panel.background = element_blank())
And this is how we produce this graph:
Step 6:
```{r , fig.width = 6.5, fig.height=2.4}
library(gridExtra)
#Arranging the Map and Histogram together
#in grid.arrange there are 4 slots in a row. you need to indicate which slot will be taken
#by fig1 vs. fig2. In our case, fig1 takes the first 3 slots
#fig2 takes the 4th slot.
pic<-grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```
And this is how we produce this graph:
Step 6:
To save the map that you made still look good, you should use the fillowing dimensions:
Thus, with the help of spatial join, we can count the number of fires by district.
We can further use a barplot to rank provinces.
ggplot(esp1c,
aes(x = reorder(NAME_2, -log_fires), y = log_fires)) +
labs(x = "Province")+
geom_bar(stat = "identity")+
theme(
panel.background = element_rect(fill = "white", color = "white"),
#axis.line = element_line(color = "black"),
panel.grid.major = element_line(color = "gray95", linetype = "solid"),
panel.grid.minor = element_blank(),
panel.border = element_rect(color = "black", fill = NA),
axis.text = element_text(color = "black"),
axis.title = element_text(color = "black"),
strip.text = element_text(color = "black"),
legend.text = element_text(color = "black"),
legend.title = element_text(color = "black"),
plot.title = element_text(color = "black"),
axis.text.x = element_text(angle = 45, hjust = 1))
Thus, with the help of spatial join, we can count the number of fires by district.
We can further use a barplot to rank provinces.
Notice also the substantial difference that logging the fire makes
We have thus far used st_join
to count points per polygon
Another great use of spatial joins is to get one attribute table onto another one
Let’s say we have all the settlements in Spain.
We want to figure out what county they are part of
We have thus far used st_join
to count points per polygon
Another great use of spatial joins is to get one attribute table onto another one
Let’s say we have all the settlements in Spain.
We want to figure out what county they are part of
Simple feature collection with 5 features and 1 field
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: -7.522678 ymin: 35.99958 xmax: -1.630416 ymax: 38.72911
Geodetic CRS: WGS 84
NAME_2 geometry
1 Almería POLYGON ((-2.525973 36.8237...
2 Cádiz POLYGON ((-6.229321 36.5273...
3 Córdoba MULTIPOLYGON (((-4.248001 3...
4 Granada POLYGON ((-3.351177 36.7284...
5 Huelva MULTIPOLYGON (((-7.121905 3...
Simple feature collection with 5 features and 1 field
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -2.667413 ymin: 37.22897 xmax: -2.030814 ymax: 37.54468
Geodetic CRS: WGS 84
NAME_4 geometry
1 Albánchez POLYGON ((-2.202082 37.3122...
2 Albox POLYGON ((-2.056954 37.4417...
3 Alcóntar POLYGON ((-2.60811 37.41111...
4 Arboleas POLYGON ((-2.131044 37.2996...
5 Armuña de Almanzora POLYGON ((-2.419804 37.3342...
This is how we can think of a spatial join:
This is how we can go about it:
#Step1: Reading Spain level-2 borders
esp2 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_2.shp", quiet = TRUE)
#Step2: Simplifying polygons
esp2<-st_simplify(esp2, dTolerance=0.01)
#Step3: Selecting only one variable of interest
esp2<-subset(esp2, select=NAME_2)
#Step4: Reading Spain level-4 borders - settlements
esp4 <- st_read(dsn="./data/gadm41_ESP_shp/gadm41_ESP_4.shp", quiet = TRUE)
#Step5: Simplifying polygons
esp4<-st_simplify(esp4, dTolerance=0.01)
#Step6: Selecting only one variable of interest
esp4b<-subset(esp4, select=NAME_4)
#Step7: Creating a unique ID (e.g. some settlements have the same name in multiple counties)
esp4b$unique_id<-as.numeric(rownames(esp4b))
#Step8: Calculating centroid
esp4_pt<-st_centroid(esp4b)
This is how polygons translate into centroids
We then perform the spatial join
We get the points esp4_pt
(settlement centroids) to acquire the attributes of esp2
(county polygons)
And this is the result:
This can obviously be mapped
And we can merge the properties of the points to the polygons
#Step1: Merging point attributes to the polygon attributes
set_poly<-left_join(esp4b, pt_joined_nogeom, by = c("unique_id"="unique_id"))
#Step2: Mapping
ggplot()+
geom_sf(data=set_poly, aes(color=NAME_2, fill=NAME_2), size=0.2)+
theme_void() +
coord_sf(xlim = c(min_lon_x - error, max_lon_x + error),
ylim = c(min_lat_y - error, max_lat_y + error))
Notice that some settlements have NA values are NA
In this case it it because come centroids can fall outside of the polygon
Notice that some settlements have NA values are NA
In this case it it because come centroids can fall outside of the polygon
A potential solution is: st_point_on_surface
:
We practiced more mapping of points and polygons
We learned how to animate points on a map.
We covered some important sf
functions:
st_join
: aggregate points by polygon usingst_join
: sharing attributes based on locationPopescu (JCU): Lecture 12