L16: Rasters and Raster Operations 2

Zonal Statistics, Vectorizing, Cropping

Bogdan G. Popescu

bogdan.popescu@johncabot.edu

John Cabot University

Today

Today’s agenda includes:

cropping raster to a polygon

zonal statistics: mean

converting rasters (stars) to polygons

subtracting rasters

Luminosity

Previously, we used satellite luminosity.

As discussed, this has been used as a proxy for economic activity.

Working with Luminosity

Let us download the files that we have worked with.

Modern Borders
Luminosity for the whole world 1992 (300MB)
Luminosity for the whole world 2012 (300MB)

Working with Luminosity 1992

Let us make a map of luminosity by focusing on Europe in 1992 and 2012

#Step1: Loadning relevant libraries
library(sf)
library(stars)
library(ggplot2)
sf_use_s2(FALSE)
#Step2: Read country borders
borders <- st_read(dsn="./data/boundaries/all-country-boundaries.shp", quiet = TRUE)
#Step3: Simplifying border to make them easier to use on our machine
borders<-st_simplify(borders, dTolerance=0.1)
#Step4: Creating a folder called "F101992"
target_dir <- "./data/luminosity/F101992"
#Step5: Create the target directory if it doesn't exist
if (!dir.exists(target_dir)) {
  dir.create(target_dir)
}
#Step6: Unzip the file
untar("./data/luminosity/F101992.v4.tar", exdir = target_dir)
#Step7: Unzip the file further
R.utils::gunzip("./data/luminosity/F101992/F101992.v4b_web.avg_vis.tif.gz", remove = FALSE, overwrite=TRUE)
#Step8: Read the raster
f1992<-read_stars("./data/luminosity/F101992/F101992.v4b_web.avg_vis.tif")
#Step9: Make sure the two files have the same CRS
st_crs(f1992)<-st_crs(borders)
#Step10: Reduce the resolution of the raster: Higher dx/day values mean lower resolution
f1992b<-st_warp(f1992, st_as_stars(st_bbox(f1992), dx = 0.3, dy = 0.3))
#Step11: Rename the raster
names(f1992b)<-"lights_1992"
#Step12:Defining the countries defining the regions of interest
norway<-subset(borders, SOVEREIGNT == "Norway")
max_lat_y_eu<-st_bbox(norway)["ymax"]
greece<-subset(borders, SOVEREIGNT == "Greece")
min_lat_y_eu<-st_bbox(greece)["ymin"]
ukraine<-subset(borders, SOVEREIGNT == "Ukraine")
max_lon_x_eu<-st_bbox(ukraine)["xmax"]
portugal<-subset(borders, SOVEREIGNT == "Portugal")
min_lon_x_eu<-st_bbox(portugal)["xmin"]
#Step13: Making the map
fig1992<-ggplot()+
  geom_stars(data=f1992b)+
  scale_fill_gradientn(name = "", colours=c("black","white"))+
  geom_sf(data=borders, linewidth = 0.1, fill = NA, color = "white", alpha=0.5)+
  coord_sf(xlim = c(min_lon_x_eu-3, max_lon_x_eu+3), ylim = c(min_lat_y_eu-1, max_lat_y_eu-6))+
  theme_bw()+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Working with Luminosity 1992

Let us make a map of luminosity by focusing on Europe in 1992 and 2012

Working with Luminosity 2012

Let us do the same for 2012

#Step1: Creating a folder called "F182012"
target_dir <- "./data/luminosity/F182012"
#Step2: Create the target directory if it doesn't exist
if (!dir.exists(target_dir)) {
  dir.create(target_dir)
}
#Step3: Unzip the file
untar("./data/luminosity/F182012.v4.tar", exdir = target_dir)
#Step4: Unzip the file further
R.utils::gunzip("./data/luminosity/F182012/F182012.v4c_web.avg_vis.tif.gz", remove = FALSE, overwrite=TRUE)
#Step5: Read the raster
f2012<-read_stars("./data/luminosity/F182012/F182012.v4c_web.avg_vis.tif")
#Step6: Make sure the two files have the same CRS
st_crs(f2012)<-st_crs(borders)
#Step7: Reduce the resolution of the raster: Higher dx/day values mean lower resolution
f2012b<-st_warp(f2012, st_as_stars(st_bbox(f2012), dx = 0.3, dy = 0.3))
#Step8: Rename the raster
names(f2012b)<-"lights_2012"
#Step9: Making the map
fig2012<-ggplot()+
  geom_stars(data=f2012b)+
  scale_fill_gradientn(name = "",
                       colours=c("black","white"))+
  geom_sf(data=borders, linewidth = 0.1, fill = NA, color = "white", alpha=0.5)+
  coord_sf(xlim = c(min_lon_x_eu-3, max_lon_x_eu+3), ylim = c(min_lat_y_eu-1, max_lat_y_eu-6))+
  theme_bw()+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Working with Luminosity 2012

fig2012

Working with Luminosity 2012

We can obviously put them side by side for comparison:

library(ggpubr)
ggarrange(fig1992, fig2012, common.legend = TRUE)

Context

The Fall of the Berlin wall is one of the crucial moments in recent European History.

It has marked the fall of communism throughout Eastern Europe.

It would be interesting if communism might have left a lingering legacy on development.

Looking at Germany alone as opposed to West vs. Eastern Europe reduces possibility of confounding variables:

institutions
legal systems
cultural norms

Germany facilitates direct examination of lingering disparities induced by communism.

Germany

Let us first download the county and country shape files for Germany

Germany

Let us first load the Germany county shape files.

#Step1: Read country borders
ger2 <- st_read(dsn="./data/gadm41_DEU_shp/gadm41_DEU_2.shp", quiet = TRUE)
ger2<-st_simplify(ger2, dTolerance=0.005)
ggplot()+
  geom_sf(data=ger2)

Germany

Let us also load the Germany country shape file.

#Step1: Read country borders
ger0 <- st_read(dsn="./data/gadm41_DEU_shp/gadm41_DEU_0.shp", quiet = TRUE, promote_to_multi = FALSE)
ger0<-st_simplify(ger0, dTolerance=0.02)
ggplot()+
  geom_sf(data=ger0)

Germany

Let us now crop the lights based on the Germany shape.

Note that we are reloading the raster to work with the raster’s original resolution.

#Step1: Read the raster
f1992<-read_stars("./data/luminosity/F101992/F101992.v4b_web.avg_vis.tif")
#Step2: Ensuring that the CRS is the same
st_crs(f1992)<-st_crs(ger0)
#Step3: Cropping raster
f1992_cropped<-st_crop(f1992, ger0)

Germany

Let us now compare the two

plot(f1992)

plot(f1992_cropped)

Germany

We can now provide more user-friendly variable names

#Renaming raster
names(f1992_cropped)<-"lights_1992"

We also need to ensure that we work with a stars object

class(f1992_cropped)

[1] "stars_proxy" "stars"

Transforming the object

#Turning it to a stars object (it is a proxy object)
f1992_cropped <- stars::st_as_stars(f1992_cropped)

Checking if we now have a stars object:

class(f1992_cropped)

[1] "stars"

Germany

We can now map the stars object

ggplot()+
  geom_stars(data=f1992_cropped)+
  scale_fill_gradientn(colours=c("black","white"), na.value=NA)

Germany

What if we tried to superimpose counties (kreise) on top of the raster?

ggplot()+
  geom_stars(data=f1992_cropped)+
  geom_sf(data=ger2, fill=NA, color="white")+
  scale_fill_gradientn(colours=c("black","white"), na.value=NA)

Zonal Statiscs

Zonal statistics is set of techniques used to calculate statistics for specific zones.

The process involves overlaying a raster or vector dataset representing the zones of interest with another dataset containing the attribute values (such as elevation, temperature, population density, etc.).

Common statistics calculated in zonal statistics analysis include:

mean
median
sum

Zonal Statiscs

This is for example what mean zonal statistics means in the context of luminosity

Zonal Statiscs

This is for example what mean zonal statistics maximum in the context of luminosity

Germany

So, let us now aggregate the pixels at a county level

#Step1: Extracting only the relevant variable
ger2b<-subset(ger2, select = c("NAME_2", "NAME_1"))
#Step2: Aggregating lights by polygon
agg_1992<- raster::aggregate(f1992_cropped, ger2b, mean, na.rm=TRUE)

Germany

We now look at the output:

ggplot()+
  geom_stars(data=agg_1992)

Stars to Polygon

We now merge the data from the stars object back to the sf obejct

library(dplyr)
#Step1: Turning the stars object into an sf
agg_sf_1992<-st_as_sf(agg_1992)
#Step2: Creating a numeric id based on dataframe index for the stars object turned sf
agg_sf_1992$id<-as.numeric(rownames(agg_sf_1992))
#Step3: Creating a numeric id based on dataframe index
ger2b$id<-as.numeric(rownames(ger2b))
#Step4: Dropping the geometry
agg_sf_1992<-st_drop_geometry(agg_sf_1992)
#Step5: Perfoming a left join by id
agg_sf2_1992<-left_join(ger2b, agg_sf_1992, by =c("id"="id"))

Stars to Polygon

We then do the same for 2012

library(dplyr)
#Step1: Read the raster
f2012<-read_stars("./data/luminosity/F182012/F182012.v4c_web.avg_vis.tif")
#Step2: Ensuring that the CRS is the same
st_crs(f2012)<-st_crs(ger0)
#Step3: Cropping raster
f2012_cropped<-st_crop(f2012, ger0)
#Step4: Renaming rasters
names(f2012_cropped)<-"lights_2012"
#Step5: Turning the file into a stars object
f2012_cropped <- stars::st_as_stars(f2012_cropped)
#Step2: Aggregating lights by polygon
agg_2012<- raster::aggregate(f2012_cropped, ger2b, mean, na.rm=TRUE)
#Step1: Turning the stars object into an sf
agg_sf_2012<-st_as_sf(agg_2012)
#Step2: Creating a numeric id based on dataframe index for the stars object turned sf
agg_sf_2012$id<-as.numeric(rownames(agg_sf_2012))
#Step3: Dropping geometry
agg_sf_2012<-st_drop_geometry(agg_sf_2012)
#Step3: Creating a numeric id based on dataframe index
#ger2b$id<-as.numeric(rownames(ger2b))
#Step4: Dropping the geometry
#ger2c<-st_drop_geometry(ger2b)
#Step5: Perfoming a left join by id
agg_sf2<-left_join(agg_sf2_1992, agg_sf_2012, by =c("id"="id"))

Stars to Polygon

Let us now look at the output: the difference between starts and sf objects

ggplot()+
  geom_sf(data=agg_sf2, aes(fill=lights_2012))

Stars to Polygon

Visually, there is no difference between stars and sf

Show the code

ggplot()+
  geom_stars(data=agg_1992)

Show the code

ggplot()+
  geom_sf(data=agg_sf2_1992, aes(fill=lights_1992))

Stars to Polygon

There is however a difference in our ability to work with the data

class(agg_1992)

[1] "stars"

class(agg_sf2_1992)

[1] "sf"         "data.frame"

Germany: East vs. West

So, is there a difference between East and West?

More precisely:

Was there a difference in development in 1992?
Is that difference still visible in 2012?
Did the East grow more than the West?

Germany: East vs. West

So, is there a difference between East and West?

Germany: East vs. West

So, is there a difference between East and West?

Steps to Finding the Difference

Section 1: Identifying counties in the East and West

Section 2: Making a East-West Division Line

Section 3: Comparing East and West

Section 1: Identifying counties in the East and West

Step1: Defining a list with states in the East

Step2: Subsetting based on that list

Step3: Dissolving the county polygons to one polygon

Step4: Creating a 500m buffer to eliminate internal geometry errors

Step5: Create county centroids to identify their location in East or in the West

Step6: Perfoming the spatial join
Step7: Merging the spatial attribute back to polygons

Step1: Defining a list with states in the East

Let us identify the counties in East Germany

The five states in East Germany are: Brandenburg, Mecklenburg-Vorpommern, Saxony, Saxony-Anhalt and Thuringia.

Here is the list:

#Step1: Defining list
list_states_east<-c("Berlin",
                    "Brandenburg",
                    "Mecklenburg-Vorpommern",
                    "Sachsen",
                    "Sachsen-Anhalt",
                    "Thüringen")

Step2: Subsetting based on that list

We can then do a subset just based on those states of NAME_1

#Step2: Subsetting based on the list
east_ger<-subset(ger2b, NAME_1 %in% list_states_east)

Let us also plot them in the context of Germany.

Show the code

#Step3: Defining Germany extent
xmin<-st_bbox(ger2b)["xmin"]
xmax<-st_bbox(ger2b)["xmax"]
ymin<-st_bbox(ger2b)["ymin"]
ymax<-st_bbox(ger2b)["ymax"]

ggplot()+
  geom_sf(data=ger2b)+
  coord_sf(ylim=c(ymin,ymax), 
           xlim=c(xmin, xmax))

Show the code

xmin<-st_bbox(ger2b)["xmin"]
xmax<-st_bbox(ger2b)["xmax"]
ymin<-st_bbox(ger2b)["ymin"]
ymax<-st_bbox(ger2b)["ymax"]

ggplot()+
  geom_sf(data=east_ger)+
  coord_sf(ylim=c(ymin,ymax), 
           xlim=c(xmin, xmax))

Step3: Dissolving the county polygons to one polygon

We then dissolve the county polygons

#Step4: Dissolving the county polygons to one polygon
east_ger_diss<-st_union(east_ger)

This is what that union looks like:

Show the code

ggplot()+
  geom_sf(data=ger0)+
  geom_sf(data=east_ger_diss, color="red", fill=NA)

Step4: Creating a 500m buffer to eliminate internal geometry errors

#Step5: Changing the coordinate system for Germany CRS
east_ger_diss2 = st_transform(east_ger_diss, 6875)
#Step6: Creating a 500m buffer to eliminate internal geometry errors
east_ger_diss3 <- st_buffer(east_ger_diss2, dist = 500)
#Step7: Turning back to WGS84
east_ger_diss4<-st_transform(st_as_sf(east_ger_diss3), st_crs(east_ger))

Here is how the two polygons compare:

Show the code

pic1<-ggplot()+
  geom_sf(data=east_ger_diss, color="black", fill=NA, linewidth=0.9)

pic2<-ggplot()+
  geom_sf(data=east_ger_diss4, color="black", fill=NA, linewidth=0.5)

ggarrange(pic1, pic2)

Step5: Create county centroids to identify their location in East or in the West

We then create the polygon centroid:

##Marking counties in the East vs. not based on spatial location
#Step8: Create centroid
ger2_centr<-st_centroid(agg_sf2)

Here is what the difference look like:

Show the code

ggplot()+
  geom_sf(data=ger2b)

Show the code

ggplot()+
  geom_sf(data=ger2_centr)

Step6: Perfoming the spatial join

We then add a column to indicate East and West.

#Step9: Adding a column to the polygon whose attribute we want
east_ger_diss4$region<-"East"
#Step10: Perfoming the spatial join
ger_centr_joined<-st_join(ger2_centr, east_ger_diss4)
#Step11: Replace NA with "West"
ger_centr_joined$region[is.na(ger_centr_joined$region)]<-"West"

Here is what the points look like after creating that column.

Show the code

ggplot()+
  geom_sf(data=ger0)+
  geom_sf(data=ger_centr_joined, aes(color=region))

Step7: Merging the spatial attribute back to polygons

We then merge the region column from the points to the polygon.

#Step12: Dropping Geometry
ger_centr_joined_df<-st_drop_geometry(ger_centr_joined)
#Step13: Keeping only relevant variable resulting from the spatial merge
ger_centr_joined_df2<-subset(ger_centr_joined_df, select=c("NAME_2", "region"))
#Step14: Merging it back to the polygon
ger2c<-left_join(agg_sf2, ger_centr_joined_df2, by = c("NAME_2" = "NAME_2"))

Step7: Merging the spatial attribute back to polygons

Here is what the output looks like:

Show the code

ggplot()+
  geom_sf(data=ger0)+
  geom_sf(data=ger2c, aes(color=region))

Section 2: Making a East-West Division Line

Step 1: Turning the multipolygon to POLYGON

Step 2: Turning the polygon to line

Step 3: Making neg. buffer to make the polygon size smaller so that we crop the internal eastern line

Step 4: Intersecting line with the smaller Germany polygon

Step 5: Calculating distance to line

Step 1: Turning the multipolygon to POLYGON

We need to first turn the multipolygon to a polygon

#Step1: Turning the multipolygon to POLYGON
east_ger_pol<-st_cast(east_ger_diss4, "POLYGON")

Let us examime the difference.

head(east_ger_pol)

Simple feature collection with 3 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 9.871475 ymin: 50.16888 xmax: 15.04897 ymax: 54.68908
Geodetic CRS:  WGS 84
    region                              x
1     East POLYGON ((10.02385 50.84353...
1.1   East POLYGON ((13.77548 54.1987,...
1.2   East POLYGON ((13.52367 54.32939...

head(east_ger_diss4)

Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 9.871475 ymin: 50.16888 xmax: 15.04897 ymax: 54.68908
Geodetic CRS:  WGS 84
                               x region
1 MULTIPOLYGON (((10.02385 50...   East

Step 2: Turning the polygong to line

Let us turn the polygon into line

east_ger_line = st_cast(east_ger_pol,"LINESTRING")

This is what the output looks like:

ggplot()+
  geom_sf(data=east_ger_pol)

ggplot()+
  geom_sf(data=east_ger_line)

Step 3: Making neg. buffer to make the polygon size smaller so that we crop the internal eastern line

We now create an inside buffer

ger0b<-st_buffer(ger0, dist=-0.03)

Step 3: Making neg. buffer to make the polygon size smaller so that we crop the internal eastern line

Here is the difference:

ggplot()+
  geom_sf(data=ger0b, color="red", fill=NA, linewidth=0.5)+
  geom_sf(data=ger0, color="yellow", fill=NA, linewidth=0.5)

Step 4: Intersecting line with the smaller Germany polygon

We then intersect the line with the polygon:

#Intersecting line with the smaller Germany polygon
intersection_lp <- st_intersection(east_ger_line, ger0b)

This is what it looks like:

ggplot()+
  geom_sf(data=intersection_lp, color="red", fill=NA, linewidth=0.5)

Step 4: Intersecting line with the smaller Germany polygon

Here is what the attributes of the output look like:

head(intersection_lp)

Simple feature collection with 2 features and 3 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 9.871475 ymin: 50.20032 xmax: 12.07138 ymax: 53.92832
Geodetic CRS:  WGS 84
     region GID_0 COUNTRY                              x
1      East   DEU Germany MULTILINESTRING ((10.02385 ...
1.24   East   DEU Germany LINESTRING (10.91872 53.916...

Step 4: Intersecting line with the smaller Germany polygon

It seems that we have two lines:

line1<-head(intersection_lp, n=1)

line2<-tail(intersection_lp, n=1)

Here is what they look like:

ggplot()+
  geom_sf(data=line1, color="red")

ggplot()+
  geom_sf(data=line2, color="blue")

Step 4: Intersecting line with the smaller Germany polygon

Plotting them together

ggplot()+
  geom_sf(data=line1, color="red")+
  geom_sf(data=line2, color="blue", linewidth=3)

We will thus keep only line1.

Step 5: Calculating distance to line

We now calculate the distance of every polygon centroid to the East-West line

##Distances
library(units)
#Step1: Calculating distance to line
distances_east_df<-data.frame(set_units(st_distance(ger2_centr, line1), 'km'))
#Step2: Providing user-friendly variable names
names(distances_east_df)<-"dist_east"
#Step3: Turning variables to numeric
distances_east_df$dist_east<-as.numeric(distances_east_df$dist_east)
#Step4: Creating id
distances_east_df$id<-as.numeric(rownames(distances_east_df))
#Step5: Merging
ger2d<-left_join(ger2c, distances_east_df, by = c("id"="id"))
#Step6: Selecting 100km-distance districts
ger_100km<-subset(ger2d, dist_east<100)

Step 5: Calculating distance to line

Here is what the distances look like:

ggplot()+
  geom_sf(data=ger2d, aes(fill=dist_east))+
  geom_sf(data=line1, color="red")

Section 3: Comparing East and West

Comparison 1992

Let us draw a boxplot to see the differences:

Show the code

library(gridExtra)

pic1<-ggplot() +
  geom_sf(data=ger2d, aes(fill=lights_1992),
          linewidth=0.01)+
  scale_fill_gradientn(colours=c("black","white"))+
  geom_sf(data = east_ger_diss4, fill = NA, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))


pic2<-ggplot(ger2d, aes(x = region, 
                         y = lights_1992, 
                         color=region)) + 
  geom_boxplot()+ 
  theme_bw()+
    theme(axis.title = element_blank(),   # Remove axis titles
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

grid.arrange(pic1, pic2, ncol=2)

Comparison 1992

But are these differences statistically significant?

ger2d_east<-subset(ger2d, region=="East")
ger2d_west<-subset(ger2d, region=="West")

test_twosided<-t.test(ger2d_east$lights_1992, ger2d_west$lights_1992,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  ger2d_east$lights_1992 and ger2d_west$lights_1992
t = -4.4277, df = 145.75, p-value = 1.857e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.042632 -3.461273
sample estimates:
mean of x mean of y 
 13.25170  19.50365

Comparison 1992

Another way to get the same result:

test_twosided<-t.test(lights_1992~region, data=ger2d,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  lights_1992 by region
t = -4.4277, df = 145.75, p-value = 1.857e-05
alternative hypothesis: true difference in means between group East and group West is not equal to 0
95 percent confidence interval:
 -9.042632 -3.461273
sample estimates:
mean in group East mean in group West 
          13.25170           19.50365

In this case, we reject the null hypothesis (H0) because p < 0.05 and conclude that there is a significant difference between the means of the two groups.

Comparison 2012

What about 2012?

Show the code

library(gridExtra)

pic1<-ggplot() +
  geom_sf(data=ger2d, aes(fill=lights_2012),
          linewidth=0.01)+
  scale_fill_gradientn(colours=c("black","white"))+
  geom_sf(data = east_ger_diss4, fill = NA, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))


pic2<-ggplot(ger2d, aes(x = region, 
                         y = lights_2012, 
                         color=region)) + 
  geom_boxplot()+ 
  theme_bw()+
    theme(axis.title = element_blank(),   # Remove axis titles
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

grid.arrange(pic1, pic2, ncol=2)

Comparison 2012

But are these differences statistically significant?

ger2d_east<-subset(ger2d, region=="East")
ger2d_west<-subset(ger2d, region=="West")

test_twosided<-t.test(ger2d_east$lights_2012, ger2d_west$lights_2012,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  ger2d_east$lights_2012 and ger2d_west$lights_2012
t = -3.6888, df = 134.52, p-value = 0.0003261
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.324930 -2.815704
sample estimates:
mean of x mean of y 
 19.06810  25.13842

In this case, we reject the null hypothesis (H0) because p < 0.05 and conclude that there is a significant difference between the means of the two groups.

Observations

Thus, even after 20 years (1992-2012), there is still a lingering difference in development between East and West.

What if we were interested in changes in lumnosity?

For example, what if we wanted to examine which are the counties which experiences the most vs. least growth?

We can calculate the difference between 2012 and 1992

a large difference means that counties developed substantially

a small difference means that counties did not develop as much (or maybe even shrunk)

Difference

Let’s calculate the difference between 2012 and 1992.

ger2d$diff<-ger2d$lights_2012-ger2d$lights_1992

Show the code

library(viridis)

x3<-ggplot() +
  geom_sf(data=ger2d, aes(fill=diff), color="white", linewidth=0.01)+
  scale_fill_viridis(option="turbo",
                      name = "Lum. \n2012\n-1992")+
  geom_sf(data = east_ger_diss4, fill = NA, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))
x3

Comparison Difference

We can finally examine the change between 2012 and 1992.

Show the code

library(viridis)

x3<-ggplot() +
  geom_sf(data=ger2d, aes(fill=diff), color="white", linewidth=0.01)+
  scale_fill_viridis(option="turbo",
                      name = "Lum. \n2012\n-1992")+
  geom_sf(data = east_ger_diss4, fill = NA, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))



pic2<-ggplot(ger2d, aes(x = region, 
                         y = diff, 
                         color=region)) + 
  geom_boxplot()+ 
  theme_bw()+
    theme(axis.title = element_blank(),   # Remove axis titles
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

grid.arrange(x3, pic2, ncol=2)

Difference

Is the difference statistically significant?

ger2d_east<-subset(ger2d, region=="East")
ger2d_west<-subset(ger2d, region=="West")

test_twosided<-t.test(ger2d_east$diff, ger2d_west$diff,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  ger2d_east$diff and ger2d_west$diff
t = 0.48112, df = 115.94, p-value = 0.6313
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.5661080  0.9293796
sample estimates:
mean of x mean of y 
 5.816408  5.634772

In this case, we do not reject the null hypothesis (H0) because p ≥ 0.05 and conclude that there is no significant difference between the means of the two groups.

Refining the Analysis

One criticism to our analysis is that we are comparing units that are too different.

For example, it feels strange to compare units like Lörrach or Uckermark.

Show the code

ger_sel<-subset(ger2d, NAME_2=="Lörrach" | NAME_2=="Uckermark")
ggplot() +
  geom_sf(data=ger2d, aes(fill=diff), color="white", linewidth=0.01)+
  scale_fill_viridis(option="turbo",
                      name = "Lum. \n2012\n-1992")+
  geom_sf(data = east_ger_diss4, fill = NA, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))+
  geom_sf(data=ger_sel, color="red", linewidth=1)+
  geom_sf_label(data=ger_sel, aes(label=NAME_2), alpha=0.6, size=2.5)

Refining the Analysis

Counties like Lörrach or Uckermark may be different for other reasons that have nothing to do with exposure to communism.

Thus, it may be more appropriate to examine counties that are close the East-West border

Show the code

ger_100km$diff<-ger_100km$lights_2012-ger_100km$lights_1992
ggplot()+
  geom_sf(data=ger0, fill=NA)+
  geom_sf(data=ger_100km, aes(fill=diff), color="white", linewidth=0.01)+
  scale_fill_viridis(option="turbo",
                      name = "Lum. \n2012\n-1992")+
  geom_sf(data = line1, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

Comparison 1992

We can now perform the analysis only based on the 100km sample around the border

Show the code

library(gridExtra)

pic1<-ggplot() +
  geom_sf(data=ger0, fill=NA)+
  geom_sf(data=ger_100km, aes(fill=lights_1992), color="white", linewidth=0.01)+
  scale_fill_gradientn(colours=c("black","white"))+
  geom_sf(data = line1, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))


pic2<-ggplot(ger_100km, aes(x = region, 
                         y = lights_1992, 
                         color=region)) + 
  geom_boxplot()+ 
  theme_bw()+
    theme(axis.title = element_blank(),   # Remove axis titles
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

grid.arrange(pic1, pic2, ncol=2)

Comparison 1992

But are these differences statistically significant?

ger2d_east<-subset(ger_100km, region=="East")
ger2d_west<-subset(ger_100km, region=="West")

test_twosided<-t.test(ger2d_east$lights_1992, ger2d_west$lights_1992,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  ger2d_east$lights_1992 and ger2d_west$lights_1992
t = -1.907, df = 125.23, p-value = 0.05881
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -7.2594941  0.1346275
sample estimates:
mean of x mean of y 
 12.77668  16.33911

In this case, we do not reject the null hypothesis (H0) because p ≥ 0.05 and conclude that there is no significant difference between the means of the two groups.

Comparison 2012

We can also perform the analysis only based on the 100km sample around the border for 2012

Show the code

library(gridExtra)

pic1<-ggplot() +
  geom_sf(data=ger0, fill=NA)+
  geom_sf(data=ger_100km, aes(fill=lights_2012), color="white", linewidth=0.01)+
  scale_fill_gradientn(colours=c("black","white"))+
  geom_sf(data = line1, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))


pic2<-ggplot(ger_100km, aes(x = region, 
                         y = lights_2012, 
                         color=region)) + 
  geom_boxplot()+ 
  theme_bw()+
    theme(axis.title = element_blank(),   # Remove axis titles
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

grid.arrange(pic1, pic2, ncol=2)

Comparison 2012

But are these differences statistically significant?

ger2d_east<-subset(ger_100km, region=="East")
ger2d_west<-subset(ger_100km, region=="West")

test_twosided<-t.test(ger2d_east$lights_2012, ger2d_west$lights_2012,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  ger2d_east$lights_2012 and ger2d_west$lights_2012
t = -1.0559, df = 116.9, p-value = 0.2932
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.888198  2.097555
sample estimates:
mean of x mean of y 
 18.77059  21.16591

In this case, we do not reject the null hypothesis (H0) because p ≥ 0.05 and conclude that there is no significant difference between the means of the two groups.

Comparison Difference

Show the code

library(viridis)


x3<-ggplot() +
  geom_sf(data=ger0, fill=NA)+
  geom_sf(data=ger_100km, aes(fill=diff), color="white", linewidth=0.01)+
  scale_fill_viridis(option="turbo",
                      name = "Lum. \n2012\n-1992")+
  geom_sf(data = line1, color = "red", linewidth = 1) +
  theme_bw()+
  theme(axis.title = element_blank(),   # Remove axis titles
        axis.ticks = element_blank(),
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))



pic2<-ggplot(ger2d, aes(x = region, 
                         y = diff, 
                         color=region)) + 
  geom_boxplot()+ 
  theme_bw()+
    theme(axis.title = element_blank(),   # Remove axis titles
        legend.box.background = element_rect(fill = "white", 
                                             linewidth=0.5, linetype="solid",
                                             colour = "black"),
        legend.position = c(0, 1),
        legend.justification = c(0, 1))

grid.arrange(x3, pic2, ncol=2)

Difference

Is the difference (2012-1992) statistically significant?

ger2d_east<-subset(ger_100km, region=="East")
ger2d_west<-subset(ger_100km, region=="West")

test_twosided<-t.test(ger2d_east$diff, ger2d_west$diff,
        alternative = c("two.sided"))
test_twosided


    Welch Two Sample t-test

data:  ger2d_east$diff and ger2d_west$diff
t = 2.1301, df = 106.88, p-value = 0.03545
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.08094403 2.25328005
sample estimates:
mean of x mean of y 
 5.993908  4.826796

In this case, we reject the null hypothesis (H0) because p < 0.05 and conclude that there is a significant difference between the means of the two groups.

Observations about the Difference

When using the entire sample, there was a difference in luminosity in 1992 between the East and the West
When using the 100-km sample, there was no difference in luminosity in 1992 between the East and the West

When using the entire sample, there was a difference in luminosity in 2012 between the East and the West
When using the 100-km sample, there was no difference in luminosity in 2012 between the East and the West

When using the entire sample, there was no difference in change (2012-1992) between the East and the West
When using the 100-km sample, there was a difference in change (2012-1992) between the East and the West

Conclusion

We covered ways to work with rasters.

cropping raster to a polygon
zonal statistics: mean
converting rasters (stars) to polygons
subtracting rasters

We also reviewed/learned new ways to work with vectors

Dissolving polygons
Creating buffers to eliminate internal geometry errors
Turned polylines into line
Calculate centroid distance to a line