Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
9.031613e-04 9.054901e-04 1.121059e-12
Understanding Wildfires in 2023
Spatial Clustering using Satellite Imagery
1 Intro
In 2023, the world experienced a significant increases in the frequency of wildfires, posing unprecedented challenges to ecosystems, communities, and firefighting efforts. This memo aims to provide a comprehensive view of where the wildfires happened. The primary driver of of the surge in wildfires is the exacerbation of climate change. This memo aims to provide a comprehensive overview of wildfires that happened in 2023 and how scattered they are around the world.
2 Data
In order to map out the areas that have been affected by wildfires, I use data from The Terra and Aqua combined MCD64A1 Version 6.1 Burned Area data product, a monthly, global gridded 500m product containing per-pixel burned-area and quality information that I downloaded from Google Earth. The MCD64A1 burned-area mapping approach employs 500m MODIS Surface Reflectance imagery coupled with 1km MODIS active fire observations. The algorithm uses a burn sensitive vegetation index (VI) to create dynamic thresholds that are applied to the composite data. The VI is derived from MODIS shortwave infrared atmospherically corrected surface reflectance bands 5 and 7 with a measure of temporal texture. The algorithm identifies the date of burn for the 500m grid cells within each individual MODIS tile. The date is encoded in a single data layer as the ordinal day of the calendar year on which the burn occurred, with values assigned to unburned land pixels and additional special values reserved for missing data and water grid cells.
2.1 Maps for the Whole World
For brevity, I only include maps for January, July, August, and September 2023. Maps for every single month can be created upon request.
- January, 2023
- July, 2023
- August, 2023
- September, 2023
What is interesting is that the wildfires are pheonomena that happen on earth throughout the year.
2.2 Maps for Europe
If we focus on Europe, here is what the maps look like
- June 2023
- July 2023
- August 2023
- September 2023
3 Analyzing Spatial Clusters
The next step is to try to understand spatial clustering of wildfires. We will do that by using Getis-Ord Gi statistics. This is one type of spatial statistics alongside local Moran’s I. The Getis-Ord Gi statistics produces z-scores, and p-values telling us where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. A feature with a high value is interesting but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and when that difference is too large to be the result of random chance, a statistically significant z-score results.
\[ G_i^* = \frac{\sum_{j=1}^{n} w_{ij}x_j - \bar{x} \sum_{j=1}^{n} w_{ij}}{\sqrt{\left(\sum_{j=1}^{n} w_{ij}^2\right) \left(\frac{\sum_{j=1}^{n} x_j^2}{n} - \bar{x}^2\right)}} \]
where:
- \(G_i^x\) - the Gi* statistic for location i
- n is the total number of locations
- \(x_j\) is the attribute value at location j
- \(\bar{x}\) is the mean of the attribute values
- \(w_{ij}\) is the spatial weight between locations i and j
globalG.test
computes a global test for spatial autocorrelation using a Monte Carlo simulation approach (simulated spatial datasets that have the same spatial structure as the original data but are randomly permuted). It tests the null hypothesis of no spatial autocorrelation against the alternative hypothesis of positive spatial autocorrelation. A simple test with the data for July reveals that following output
The output shows a standard deviate of -2.199, which indicates that the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.
The sample estimates of the Global G statistic, Expectation, and Variance are also reported. The Global G statistic is 9.031613e-04, which is the observed value of the clustering statistic for the entire dataset. The Expectation is 9.054901e-04, which is the expected value of the clustering statistic under the null hypothesis of no clustering. Finally, the Variance is 1.121059e-12, which is an estimate of the variance of the clustering statistic under the null hypothesis.
Overall, the output for July suggests that there is no statistically significant clustering of burned areas. What if we repeat the procedure for another month? Let’s say August
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
8.942119e-04 8.907281e-04 7.024831e-13
The output shows a standard deviate of 4.1566, which indicates that the observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.
Overall, the output suggests that there is area statistically significant clustering of burned areas in the month of August. We can also calculate the Local GI to get a sense of the local hotspots of wildfires.
4 Conclusion
Spatial statistics can help identify areas where wildfires are clustered (hotspots) or areas where there are fewer occurrences than expected (coldspots). This information is crucial for policymakers to prioritize resources and interventions. By understanding the spatial distribution of wildfires using Gi* statistics, policymakers can allocate resources more effectively. High-risk areas identified as hotspots may require more attention in terms of firefighting resources, prevention measures, and public awareness campaigns. Analyzing the spatial patterns of wildfires can inform land use planning decisions. If certain areas consistently exhibit high wildfire risk, policymakers may need to reconsider zoning regulations, building codes, or infrastructure development in those regions.
5 Appendix
This is a technical appendix for the operations performed to create this memo.
5.1 Part 1: Reading the Data
library(sf)
library(stars)
library(dplyr)
library(sfdep)
library(tidyr)
library(ggplot2)
#Step1: Turning spherical geomtry off
::sf_use_s2(FALSE)
sf#Step2: Reading world borders
<- st_read(dsn="/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/week9/data/boundaries/all-country-boundaries.shp", quiet = TRUE)
borders #Step3: Simplifying borders
<-st_simplify(borders, dTolerance = 0.1)
borders2#Step4: Reading rasters with wildfires
<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000000000.tif")
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000023296.tif")
right_hem#Step5: Mosaicking the wildfire rasters
<-st_mosaic(left_hem, right_hem)
july
#Step4: Reading rasters with wildfires
<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000000000.tif")
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000023296.tif")
right_hem#Step5: Mosaicking the wildfire rasters
<-st_mosaic(left_hem, right_hem)
august
#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
<-st_warp(july, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
burn2_july#Step7: Turning the rasters into points
<-st_as_sf(burn2_july, as_points = FALSE, merge = FALSE)
burn3_julynames(burn3_july)[1]<-"burn"
#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
<-st_warp(august, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
burn2_august#Step7: Turning the rasters into points
<-st_as_sf(burn2_august, as_points = FALSE, merge = FALSE)
burn3_augustnames(burn3_august)[1]<-"burn"
5.2 Part 2: Doing the neighbor analysis
library(spdep)
#Step1: Create a neighbor list based on queen contiguity
<- spdep::poly2nb(burn3_july, queen = TRUE)
list_nb #Step2: Checking for empty neighbor sets (polygons without neighbors).
<- which(card(list_nb) == 0)
empty_nb #Step3: Remove polygons with empty neighbor sets from the data
<- burn3_july[-empty_nb, ]
tes_subset #Step4: Identify neighbors with queen contiguity (edge/vertex touching)
<- poly2nb(tes_subset, queen = TRUE)
tes_nb #Step5: Binary weighting assigns a weight of 1 to all neighboring features
# and a weight of 0 to all other features
<- nb2listw(tes_nb, style="B")
tes_w_binary #Step6: Calculate spatial lag for burn
<- lag.listw(tes_w_binary, tes_subset$burn)
tes_lag #Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
9.031613e-04 9.054901e-04 1.121059e-12
6 Global G Test
globalG.test
computes a global test for spatial autocorrelation using a Monte Carlo simulation approach (simulated spatial datasets that have the same spatial structure as the original data but are randomly permuted). It tests the null hypothesis of no spatial autocorrelation against the alternative hypothesis of positive spatial autocorrelation.
This is how we perform the test:
# Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
9.031613e-04 9.054901e-04 1.121059e-12
The output shows a standard deviate of -2.199, which indicates that the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.
The sample estimates of the Global G statistic, Expectation, and Variance are also reported. The Global G statistic is 9.031613e-04, which is the observed value of the clustering statistic for the entire dataset. The Expectation is 9.054901e-04, which is the expected value of the clustering statistic under the null hypothesis of no clustering. Finally, the Variance is 1.121059e-12, which is an estimate of the variance of the clustering statistic under the null hypothesis.
Overall, the output suggests that there is not statistically significant clustering of burned areas. What if we repeat the procedure for another month? Let’s say August
library(spdep)
#Step1: Create a neighbor list based on queen contiguity
<- spdep::poly2nb(burn3_august, queen = TRUE)
list_nb #Step2: Checking for empty neighbor sets (polygons without neighbors).
<- which(card(list_nb) == 0)
empty_nb #Step3: Remove polygons with empty neighbor sets from the data
<- burn3_august[-empty_nb, ]
tes_subset #Step4: Identify neighbors with queen contiguity (edge/vertex touching)
<- poly2nb(tes_subset, queen = TRUE)
tes_nb #Step5: Binary weighting assigns a weight of 1 to all neighboring features
# and a weight of 0 to all other features
<- nb2listw(tes_nb, style="B")
tes_w_binary #Step6: Calculate spatial lag of TreEqty
<- lag.listw(tes_w_binary, tes_subset$burn)
tes_lag #Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)
Getis-Ord global G statistic
data: tes_subset$burn
weights: tes_w_binary
standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
8.942119e-04 8.907281e-04 7.024831e-13
The output shows a standard deviate of 4.1566, which indicates that the observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.
Overall, the output suggests that there is area statistically significant clustering of burned areas.
7 Calculating the Local GI
We can calculate the Local GI to get a sense of the local hotspots in wildfires
# Identify neighbors, create weights, calculate spatial lag
<- tes_subset%>%
tes_nbs mutate(
nb = st_contiguity(geometry), # neighbors share border/vertex
wt = st_weights(nb), # row-standardized weights
tes_lag = st_lag(burn, nb, wt) # calculate spatial lag of TreEqty
)
We can calculate the Gi statistic using local_g_perm()
. The Gi is the ratio of the spatial lag of a feature to the sum of the feature’s values for its neighbors. A positive Gi value indicates that a feature and its neighbors have high values, while a negative Gi value indicates that they have low values. The magnitude of the Gi value indicates the strength of the clustering.
# Calculate the Gi using local_g_perm
<- tes_nbs %>%
tes_hot_spots mutate(
Gi = local_g_perm(burn, nb, wt, nsim = 999)
# nsim = number of Monte Carlo simulations (999 is default)
|>
) # The new 'Gi' column itself contains a dataframe
# We can't work with that, so we need to 'unnest' it
unnest(Gi)
Let’s do a cursory visualization of Gi values
We can zoom in even further
We’re getting an idea of spatial patterns of burned values in Africa. But are they statistically significant? We will consider p-values in the next step.
The hotspot analysis map assessing burned areas across Africa reveals a clear pattern of spatial clustering, with hotspots in the East and coldspots in the West.