1 Intro

In 2023, the world experienced a significant increases in the frequency of wildfires, posing unprecedented challenges to ecosystems, communities, and firefighting efforts. This memo aims to provide a comprehensive view of where the wildfires happened. The primary driver of of the surge in wildfires is the exacerbation of climate change. This memo aims to provide a comprehensive overview of wildfires that happened in 2023 and how scattered they are around the world.

2 Data

In order to map out the areas that have been affected by wildfires, I use data from The Terra and Aqua combined MCD64A1 Version 6.1 Burned Area data product, a monthly, global gridded 500m product containing per-pixel burned-area and quality information that I downloaded from Google Earth. The MCD64A1 burned-area mapping approach employs 500m MODIS Surface Reflectance imagery coupled with 1km MODIS active fire observations. The algorithm uses a burn sensitive vegetation index (VI) to create dynamic thresholds that are applied to the composite data. The VI is derived from MODIS shortwave infrared atmospherically corrected surface reflectance bands 5 and 7 with a measure of temporal texture. The algorithm identifies the date of burn for the 500m grid cells within each individual MODIS tile. The date is encoded in a single data layer as the ordinal day of the calendar year on which the burn occurred, with values assigned to unburned land pixels and additional special values reserved for missing data and water grid cells.

2.1 Maps for the Whole World

For brevity, I only include maps for January, July, August, and September 2023. Maps for every single month can be created upon request.

January, 2023

July, 2023

August, 2023

September, 2023

What is interesting is that the wildfires are pheonomena that happen on earth throughout the year.

2.2 Maps for Europe

If we focus on Europe, here is what the maps look like

June 2023

July 2023

August 2023

September 2023

3 Analyzing Spatial Clusters

The next step is to try to understand spatial clustering of wildfires. We will do that by using Getis-Ord Gi statistics. This is one type of spatial statistics alongside local Moran’s I. The Getis-Ord Gi statistics produces z-scores, and p-values telling us where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. A feature with a high value is interesting but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and when that difference is too large to be the result of random chance, a statistically significant z-score results.

G_{i}^{*} = \frac{\sum_{j = 1}^{n} w_{i j} x_{j} - \bar{x} \sum_{j = 1}^{n} w_{i j}}{\sqrt{(\sum_{j = 1}^{n} w_{i j}^{2}) (\frac{\sum_{j = 1}^{n} x_{j}^{2}}{n} - {\bar{x}}^{2})}}

where:

$G_{i}^{x}$ - the Gi* statistic for location i
n is the total number of locations
$x_{j}$ is the attribute value at location j
$\bar{x}$ is the mean of the attribute values
$w_{i j}$ is the spatial weight between locations i and j

globalG.test computes a global test for spatial autocorrelation using a Monte Carlo simulation approach (simulated spatial datasets that have the same spatial structure as the original data but are randomly permuted). It tests the null hypothesis of no spatial autocorrelation against the alternative hypothesis of positive spatial autocorrelation. A simple test with the data for July reveals that following output


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12

The output shows a standard deviate of -2.199, which indicates that the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.

The sample estimates of the Global G statistic, Expectation, and Variance are also reported. The Global G statistic is 9.031613e-04, which is the observed value of the clustering statistic for the entire dataset. The Expectation is 9.054901e-04, which is the expected value of the clustering statistic under the null hypothesis of no clustering. Finally, the Variance is 1.121059e-12, which is an estimate of the variance of the clustering statistic under the null hypothesis.

Overall, the output for July suggests that there is no statistically significant clustering of burned areas. What if we repeat the procedure for another month? Let’s say August


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      8.942119e-04       8.907281e-04       7.024831e-13

The output shows a standard deviate of 4.1566, which indicates that the observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.

Overall, the output suggests that there is area statistically significant clustering of burned areas in the month of August. We can also calculate the Local GI to get a sense of the local hotspots of wildfires.

4 Conclusion

Spatial statistics can help identify areas where wildfires are clustered (hotspots) or areas where there are fewer occurrences than expected (coldspots). This information is crucial for policymakers to prioritize resources and interventions. By understanding the spatial distribution of wildfires using Gi* statistics, policymakers can allocate resources more effectively. High-risk areas identified as hotspots may require more attention in terms of firefighting resources, prevention measures, and public awareness campaigns. Analyzing the spatial patterns of wildfires can inform land use planning decisions. If certain areas consistently exhibit high wildfire risk, policymakers may need to reconsider zoning regulations, building codes, or infrastructure development in those regions.

5 Appendix

This is a technical appendix for the operations performed to create this memo.

5.1 Part 1: Reading the Data

library(sf)
library(stars)
library(dplyr)
library(sfdep)
library(tidyr)
library(ggplot2)
#Step1: Turning spherical geomtry off
sf::sf_use_s2(FALSE)
#Step2: Reading world borders
borders <- st_read(dsn="/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/week9/data/boundaries/all-country-boundaries.shp", quiet = TRUE)
#Step3: Simplifying borders
borders2<-st_simplify(borders, dTolerance = 0.1)
#Step4: Reading rasters with wildfires
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000000000.tif")
right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000023296.tif")
#Step5: Mosaicking the wildfire rasters
july<-st_mosaic(left_hem, right_hem)

#Step4: Reading rasters with wildfires
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000000000.tif")
right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000023296.tif")
#Step5: Mosaicking the wildfire rasters
august<-st_mosaic(left_hem, right_hem)


#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
burn2_july<-st_warp(july, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
#Step7: Turning the rasters into points
burn3_july<-st_as_sf(burn2_july, as_points = FALSE, merge = FALSE)
names(burn3_july)[1]<-"burn"


#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
burn2_august<-st_warp(august, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
#Step7: Turning the rasters into points
burn3_august<-st_as_sf(burn2_august, as_points = FALSE, merge = FALSE)
names(burn3_august)[1]<-"burn"

5.2 Part 2: Doing the neighbor analysis

library(spdep)
#Step1: Create a neighbor list based on queen contiguity
list_nb <- spdep::poly2nb(burn3_july, queen = TRUE)
#Step2: Checking for empty neighbor sets (polygons without neighbors).
empty_nb <- which(card(list_nb) == 0)
#Step3: Remove polygons with empty neighbor sets from the data
tes_subset <- burn3_july[-empty_nb, ]
#Step4: Identify neighbors with queen contiguity (edge/vertex touching)
tes_nb <- poly2nb(tes_subset, queen = TRUE)
#Step5: Binary weighting assigns a weight of 1 to all neighboring features 
# and a weight of 0 to all other features
tes_w_binary <- nb2listw(tes_nb, style="B")
#Step6:  Calculate spatial lag for burn
tes_lag <- lag.listw(tes_w_binary, tes_subset$burn)
#Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12

6 Global G Test

globalG.test computes a global test for spatial autocorrelation using a Monte Carlo simulation approach (simulated spatial datasets that have the same spatial structure as the original data but are randomly permuted). It tests the null hypothesis of no spatial autocorrelation against the alternative hypothesis of positive spatial autocorrelation.

This is how we perform the test:

# Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12

The output shows a standard deviate of -2.199, which indicates that the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.

The sample estimates of the Global G statistic, Expectation, and Variance are also reported. The Global G statistic is 9.031613e-04, which is the observed value of the clustering statistic for the entire dataset. The Expectation is 9.054901e-04, which is the expected value of the clustering statistic under the null hypothesis of no clustering. Finally, the Variance is 1.121059e-12, which is an estimate of the variance of the clustering statistic under the null hypothesis.

Overall, the output suggests that there is not statistically significant clustering of burned areas. What if we repeat the procedure for another month? Let’s say August

library(spdep)
#Step1: Create a neighbor list based on queen contiguity
list_nb <- spdep::poly2nb(burn3_august, queen = TRUE)
#Step2: Checking for empty neighbor sets (polygons without neighbors).
empty_nb <- which(card(list_nb) == 0)
#Step3: Remove polygons with empty neighbor sets from the data
tes_subset <- burn3_august[-empty_nb, ]
#Step4: Identify neighbors with queen contiguity (edge/vertex touching)
tes_nb <- poly2nb(tes_subset, queen = TRUE)
#Step5: Binary weighting assigns a weight of 1 to all neighboring features 
# and a weight of 0 to all other features
tes_w_binary <- nb2listw(tes_nb, style="B")
#Step6:  Calculate spatial lag of TreEqty
tes_lag <- lag.listw(tes_w_binary, tes_subset$burn)
#Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      8.942119e-04       8.907281e-04       7.024831e-13

The output shows a standard deviate of 4.1566, which indicates that the observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering. This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level. The alternative hypothesis is “greater,” which means that the analysis is looking for clusters of high burned areas.

Overall, the output suggests that there is area statistically significant clustering of burned areas.

7 Calculating the Local GI

We can calculate the Local GI to get a sense of the local hotspots in wildfires

# Identify neighbors, create weights, calculate spatial lag
tes_nbs <- tes_subset%>% 
  mutate(
    nb = st_contiguity(geometry),        # neighbors share border/vertex
    wt = st_weights(nb),                 # row-standardized weights
    tes_lag = st_lag(burn, nb, wt)    # calculate spatial lag of TreEqty
  )

We can calculate the Gi statistic using local_g_perm(). The Gi is the ratio of the spatial lag of a feature to the sum of the feature’s values for its neighbors. A positive Gi value indicates that a feature and its neighbors have high values, while a negative Gi value indicates that they have low values. The magnitude of the Gi value indicates the strength of the clustering.

# Calculate the Gi using local_g_perm
tes_hot_spots <- tes_nbs %>%
  mutate(
    Gi = local_g_perm(burn, nb, wt, nsim = 999)
    # nsim = number of Monte Carlo simulations (999 is default)
  ) |> 
  # The new 'Gi' column itself contains a dataframe 
  # We can't work with that, so we need to 'unnest' it
  unnest(Gi)

Let’s do a cursory visualization of Gi values

We can zoom in even further

We’re getting an idea of spatial patterns of burned values in Africa. But are they statistically significant? We will consider p-values in the next step.

The hotspot analysis map assessing burned areas across Africa reveals a clear pattern of spatial clustering, with hotspots in the East and coldspots in the West.