Wildfires in 2023

Spatial Clustering using Satellite Imagery

Bogdan G. Popescu

John Cabot University

Statistical Discrimination

Statistical Discrimination - use of statistical information or group-based averages to make decisions or judgments about individuals.

It can serve as a decision-making tool, particularly when individual-level information is limited or expensive to obtain.

This approach assumes that certain statistical patterns or averages observed within a group can provide insights into the likely characteristics of an individual belonging to that group.

Statistical discrimination can perpetuate biases and inequalities

Hot Spot Analysis and Statistical Discrimination

Hot spot analysis involves identifying geographic areas or regions where certain characteristics or behaviors are concentrated.

Hot spot analysis aims to uncover spatial patterns and concentrations of specific variables or attributes within a given area.

By employing statistical methods, researchers can identify areas where certain phenomena are more prevalent than in surrounding regions

This approach is valuable for understanding spatial patterns and making informed decisions based on the distribution of specific factors.

Applications of Hot Spot Analysis

Criminal Justice - Identifying hot spots of criminal activity to allocate resources effectively

Public Health Hot spot analysis in public health helps identify regions with elevated risks of disease outbreaks or health issues.

Education Hot spot analysis can be applied to educational data to identify areas with lower academic performance, high dropout rates, or other education-related challenges.

How does it work

Hot spot analysis in statistical discrimination relies on geospatial data, statistical techniques, and mapping tools to visualize and interpret patterns within a given area.

It enables decision-makers to tailor interventions and policies to the specific needs of certain geographic regions, addressing disparities and optimizing resource allocation

It a similar way to statistical discrimination, it’s essential to consider ethical implications and potential biases in data collection and analysis to avoid reinforcing stereotypes or discriminating against certain communities.

Intro

  • In 2023, the world experienced a significant increases in the frequency of wildfires

  • Wildfires pose challenges to ecosystem, communities and firefighting efforts

Objectives

This exercise aims to map all wildfires that happened in 2023 on a month-by-month basis

It also tries to perform statistical analyses to indentify statistical clusters.

Data

  • The data is based off of The Terra and Aqua combined MCD64A1 Version 6.1 Burned Area data product

  • This is a monthly, global gridded 500m product containing per-pixel burned-area and quality information

Maps for the Whole World: January 2023

Maps for the Whole World: July 2023

Maps for the Whole World: August 2023

Maps for the Whole World: Sep. 2023

Maps for Europe: June 2023

Maps for Europe: July 2023

Maps for Europe: August 2023

Maps for Europe: September 2023

Analyzing Spatial Clusters

To analyze cluster, I use Getis-Ord Gi statistics.

The Getis-Ord Gi statistics produces z-scores, and p-values, showing where features with either high or low values cluster spatially.

The Getis-Ord Gi statistic

\[ G_i^* = \frac{\sum_{j=1}^{n} w_{ij}x_j - \bar{x} \sum_{j=1}^{n} w_{ij}}{\sqrt{\left(\sum_{j=1}^{n} w_{ij}^2\right) \left(\frac{\sum_{j=1}^{n} x_j^2}{n} - \bar{x}^2\right)}} \] where:

  • \(G_i^x\) - the Gi* statistic for location i
  • n is the total number of locations
  • \(x_j\) is the attribute value at location j
  • \(\bar{x}\) is the mean of the attribute values
  • \(w_{ij}\) is the spatial weight between locations i and j

The Global G Test: Data in July


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12 

The output shows a standard deviate of -2.199,

  • the observed clustering of wildfires is -2.199 standard deviations away from what would be expected under the null hypothesis of no clustering.

  • This value is associated with a p-value of 0.9861, so observed clustering is not statistically significant at the 0.05 level.

The Global G Test: Data in August


    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      8.942119e-04       8.907281e-04       7.024831e-13 

The output shows a standard deviate of 4.1566,

  • The observed clustering of wildfires is 4.1566 standard deviations away from what would be expected under the null hypothesis of no clustering.

  • This value is associated with a p-value of 1.615e-05, so observed clustering is statistically significant at the 0.05 level.

The Hotspots in August

Conclusion

  • Spatial statistics can help identify areas where wildfires are clustered (hotspots)

  • This information is crucial for policymakers to prioritize resources and interventions.

  • The spatial distribution of wildfires using Gi* statistics can help inform policy makers where to allocate resources more effectively

Technical Appendix

  • The full code to replicate this study is in the appendix of the memo.

Part 1: Reading the Data

library(sf)
library(stars)
library(dplyr)
library(sfdep)
library(tidyr)
library(ggplot2)
#Step1: Turning spherical geomtry off
sf::sf_use_s2(FALSE)
#Step2: Reading world borders
borders <- st_read(dsn="/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/week9/data/boundaries/all-country-boundaries.shp", quiet = TRUE)
#Step3: Simplifying borders
borders2<-st_simplify(borders, dTolerance = 0.1)
#Step4: Reading rasters with wildfires
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000000000.tif")
right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/july/july-0000000000-0000023296.tif")
#Step5: Mosaicking the wildfire rasters
july<-st_mosaic(left_hem, right_hem)

#Step4: Reading rasters with wildfires
left_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000000000.tif")
right_hem<-read_stars("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/big_data/example/data/august/august-0000000000-0000023296.tif")
#Step5: Mosaicking the wildfire rasters
august<-st_mosaic(left_hem, right_hem)


#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
burn2_july<-st_warp(july, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
#Step7: Turning the rasters into points
burn3_july<-st_as_sf(burn2_july, as_points = FALSE, merge = FALSE)
names(burn3_july)[1]<-"burn"


#Step6: Reducing resolution. Higher dx/day values values mean lower resolution
burn2_august<-st_warp(august, st_as_stars(st_bbox(borders), dx = 0.1, dy = 0.1))
#Step7: Turning the rasters into points
burn3_august<-st_as_sf(burn2_august, as_points = FALSE, merge = FALSE)
names(burn3_august)[1]<-"burn"

Part 2: Doing the neighbor analysis: Data July

library(spdep)
#Step1: Create a neighbor list based on queen contiguity
list_nb <- spdep::poly2nb(burn3_july, queen = TRUE)
#Step2: Checking for empty neighbor sets (polygons without neighbors).
empty_nb <- which(card(list_nb) == 0)
#Step3: Remove polygons with empty neighbor sets from the data
tes_subset <- burn3_july[-empty_nb, ]
#Step4: Identify neighbors with queen contiguity (edge/vertex touching)
tes_nb <- poly2nb(tes_subset, queen = TRUE)
#Step5: Binary weighting assigns a weight of 1 to all neighboring features 
# and a weight of 0 to all other features
tes_w_binary <- nb2listw(tes_nb, style="B")
#Step6:  Calculate spatial lag for burn
tes_lag <- lag.listw(tes_w_binary, tes_subset$burn)
#Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)

    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = -2.1995, p-value = 0.9861
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      9.031613e-04       9.054901e-04       1.121059e-12 

Part 2: Doing the neighbor analysis: Data August

library(spdep)
#Step1: Create a neighbor list based on queen contiguity
list_nb <- spdep::poly2nb(burn3_august, queen = TRUE)
#Step2: Checking for empty neighbor sets (polygons without neighbors).
empty_nb <- which(card(list_nb) == 0)
#Step3: Remove polygons with empty neighbor sets from the data
tes_subset <- burn3_august[-empty_nb, ]
#Step4: Identify neighbors with queen contiguity (edge/vertex touching)
tes_nb <- poly2nb(tes_subset, queen = TRUE)
#Step5: Binary weighting assigns a weight of 1 to all neighboring features 
# and a weight of 0 to all other features
tes_w_binary <- nb2listw(tes_nb, style="B")
#Step6:  Calculate spatial lag of TreEqty
tes_lag <- lag.listw(tes_w_binary, tes_subset$burn)
#Step7: Test for global G statistic of burned areas
globalG.test(tes_subset$burn, tes_w_binary)

    Getis-Ord global G statistic

data:  tes_subset$burn 
weights: tes_w_binary 

standard deviate = 4.1566, p-value = 1.615e-05
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      8.942119e-04       8.907281e-04       7.024831e-13 

Calculating the Local GI

We can calculate the Local GI to get a sense of the local hotspots in wildfires

# Identify neighbors, create weights, calculate spatial lag
tes_nbs <- tes_subset%>% 
  mutate(
    nb = st_contiguity(geometry),        # neighbors share border/vertex
    wt = st_weights(nb),                 # row-standardized weights
    tes_lag = st_lag(burn, nb, wt)    # calculate spatial lag of TreEqty
  ) 

Calculating the Local GI

# Calculate the Gi using local_g_perm
tes_hot_spots <- tes_nbs %>%
  mutate(
    Gi = local_g_perm(burn, nb, wt, nsim = 999)
    # nsim = number of Monte Carlo simulations (999 is default)
  ) |> 
  # The new 'Gi' column itself contains a dataframe 
  # We can't work with that, so we need to 'unnest' it
  unnest(Gi)

Let’s do a cursory visualization of Gi values

Calculating the Local GI

We can zoom in even further

Final Hotspot in August

tes_hot_spots2<-subset(tes_hot_spots, select = c(gi, p_folded_sim))
tes_hot_spots3<-tes_hot_spots2%>%
  mutate(
    # Add a new column called "classification"
    classification = case_when(
      # Classify based on the following criteria:
      gi > 0 & p_folded_sim <= 0.01 ~ "Very hot",
      gi > 0 & p_folded_sim <= 0.05 ~ "Hot",
      gi > 0 & p_folded_sim <= 0.1 ~ "Somewhat hot",
      gi < 0 & p_folded_sim <= 0.01 ~ "Very cold",
      gi < 0 & p_folded_sim <= 0.05 ~ "Cold",
      gi < 0 & p_folded_sim <= 0.1 ~ "Somewhat cold",
      TRUE ~ "Insignificant"
    ),
    # Convert 'classification' into a factor for easier plotting
    classification = factor(
      classification,
      levels = c("Very hot", "Hot", "Somewhat hot",
                 "Insignificant",
                 "Somewhat cold", "Cold", "Very cold")
    )
  )

ggplot() +
  geom_sf(data=tes_hot_spots3, color=NA, aes(fill = classification)) +
  scale_fill_brewer(type = "div", palette = 5) +
    geom_sf(data=borders, fill=NA)+
  coord_sf(xlim = c(min_lon, max_lon), ylim = c(-15, -5), expand = FALSE)

Final Hotspot in August