Relational Data

In the real world, people typically have to work with many tables
Multiple tables of data that need to be combined together, are called relational data

Examples: NYC Flights

We will first load two libraries

library(tidyverse)
library(nycflights13)

There are five databases (tibbles) inside nycflights13:

airlines
airports
planes
weather
flights

The `airlines` tibble

Create new objects for each tibble/data frame and print the first 5 entries

airlines<-airlines
head(airlines, n=5)

# A tibble: 5 × 2
  carrier name                  
  <chr>   <chr>                 
1 9E      Endeavor Air Inc.     
2 AA      American Airlines Inc.
3 AS      Alaska Airlines Inc.  
4 B6      JetBlue Airways       
5 DL      Delta Air Lines Inc.

The `airports` tibble

airports<-airports
head(airports, n=5)

# A tibble: 5 × 8
  faa   name                            lat   lon   alt    tz dst   tzone       
  <chr> <chr>                         <dbl> <dbl> <dbl> <dbl> <chr> <chr>       
1 04G   Lansdowne Airport              41.1 -80.6  1044    -5 A     America/New…
2 06A   Moton Field Municipal Airport  32.5 -85.7   264    -6 A     America/Chi…
3 06C   Schaumburg Regional            42.0 -88.1   801    -6 A     America/Chi…
4 06N   Randall Airport                41.4 -74.4   523    -5 A     America/New…
5 09J   Jekyll Island Airport          31.1 -81.4    11    -5 A     America/New…

The `weather` tibble

weather<-weather
head(weather, n=5)

# A tibble: 5 × 15
  origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
  <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4         NA
2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06        NA
3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5         NA
4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7         NA
5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7         NA
# ℹ 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
#   time_hour <dttm>

The `planes` tibble

planes<-planes
head(planes, n=5)

# A tibble: 5 × 9
  tailnum  year type               manufacturer model engines seats speed engine
  <chr>   <int> <chr>              <chr>        <chr>   <int> <int> <int> <chr> 
1 N10156   2004 Fixed wing multi … EMBRAER      EMB-…       2    55    NA Turbo…
2 N102UW   1998 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…
3 N103US   1999 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…
4 N104UW   1999 Fixed wing multi … AIRBUS INDU… A320…       2   182    NA Turbo…
5 N10575   2002 Fixed wing multi … EMBRAER      EMB-…       2    55    NA Turbo…

The `flights` tibble

flights<-flights
head(flights, n=5)

# A tibble: 5 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

The tibbles together

This is how the five tibbles could be connected to one another

The tibbles together

flights connects to planes via a single variable, tailnum

The tibbles together

flights connects to planes via a single variable, tailnum

The tibbles together

flights connects to airlines through the carrier variable

The tibbles together

flights connects to airports through: the origin and dest variables

The tibbles together

flights connects to airports through: the origin and dest variables

The tibbles together

flights connects to weather via origin (the location), and year, month, day and hour (the time)

Keys

The variables used to connect the tibbles are keys which are used to uniquely identify the observations
Sometimes one variable uniquely identifies the observation - tailnum for planes
Other times, more than one variable is necessary - year, month, day, hour, and origin.

Keys

It is always good practice to make sure that the keys identified are in fact unique
This can be done this way for planes:

planes2<-subset(planes, duplicated(tailnum))
planes2

# A tibble: 0 × 9
# ℹ 9 variables: tailnum <chr>, year <int>, type <chr>, manufacturer <chr>,
#   model <chr>, engines <int>, seats <int>, speed <int>, engine <chr>

This means that tailnum uniquely identifies planes

Types of Joins

A left join keeps all the observations in x (left dataframe).
A right join keeps all the observations in y (right dataframe)
A full join keeps all the observations in both x and y

Left Join

Right Join

Full Join

Advice

The most common type of join that you will need in practive is a left join
This allows you to preserve the original observation even when there is not match

Caveat: Duplicate Keys

It is important to consider what happens when we have duplicate keys
Case 1: One Table has duplicated keys

Caveat: Duplicate Keys

It is important to consider what happens when we have duplicate keys
Case 2: Both Tables have duplicated keys
When you join duplicated keys, you get all possible combinations, the Cartesian product

Example of a Left Join

Left Join

flights connects to planes via a single variable, tailnum

Left Join

Step1: Selecting only relevant variables

library(tidyverse)
library(nycflights13)
flights2 <- subset(flights, select = c(day, hour, origin, dest, tailnum, carrier))

Left Join

Step2: Performing the Left Join

fights_planes<-left_join(flights2, planes, by = c("tailnum" = "tailnum"))
fights_planes

# A tibble: 336,776 × 14
     day  hour origin dest  tailnum carrier  year type        manufacturer model
   <int> <dbl> <chr>  <chr> <chr>   <chr>   <int> <chr>       <chr>        <chr>
 1     1     5 EWR    IAH   N14228  UA       1999 Fixed wing… BOEING       737-…
 2     1     5 LGA    IAH   N24211  UA       1998 Fixed wing… BOEING       737-…
 3     1     5 JFK    MIA   N619AA  AA       1990 Fixed wing… BOEING       757-…
 4     1     5 JFK    BQN   N804JB  B6       2012 Fixed wing… AIRBUS       A320…
 5     1     6 LGA    ATL   N668DN  DL       1991 Fixed wing… BOEING       757-…
 6     1     5 EWR    ORD   N39463  UA       2012 Fixed wing… BOEING       737-…
 7     1     6 EWR    FLL   N516JB  B6       2000 Fixed wing… AIRBUS INDU… A320…
 8     1     6 LGA    IAD   N829AS  EV       1998 Fixed wing… CANADAIR     CL-6…
 9     1     6 JFK    MCO   N593JB  B6       2004 Fixed wing… AIRBUS       A320…
10     1     6 LGA    ORD   N3ALAA  AA         NA <NA>        <NA>         <NA> 
# ℹ 336,766 more rows
# ℹ 4 more variables: engines <int>, seats <int>, speed <int>, engine <chr>

Left Join

Step2: Performing the Left Join

Because we have the same variable - tailnum in both dataframes, we can also simplify by writing

fights_planes<-left_join(flights2, planes, by = c("tailnum"))
fights_planes

# A tibble: 336,776 × 14
     day  hour origin dest  tailnum carrier  year type        manufacturer model
   <int> <dbl> <chr>  <chr> <chr>   <chr>   <int> <chr>       <chr>        <chr>
 1     1     5 EWR    IAH   N14228  UA       1999 Fixed wing… BOEING       737-…
 2     1     5 LGA    IAH   N24211  UA       1998 Fixed wing… BOEING       737-…
 3     1     5 JFK    MIA   N619AA  AA       1990 Fixed wing… BOEING       757-…
 4     1     5 JFK    BQN   N804JB  B6       2012 Fixed wing… AIRBUS       A320…
 5     1     6 LGA    ATL   N668DN  DL       1991 Fixed wing… BOEING       757-…
 6     1     5 EWR    ORD   N39463  UA       2012 Fixed wing… BOEING       737-…
 7     1     6 EWR    FLL   N516JB  B6       2000 Fixed wing… AIRBUS INDU… A320…
 8     1     6 LGA    IAD   N829AS  EV       1998 Fixed wing… CANADAIR     CL-6…
 9     1     6 JFK    MCO   N593JB  B6       2004 Fixed wing… AIRBUS       A320…
10     1     6 LGA    ORD   N3ALAA  AA         NA <NA>        <NA>         <NA> 
# ℹ 336,766 more rows
# ℹ 4 more variables: engines <int>, seats <int>, speed <int>, engine <chr>

Left Join by Multiple Keys

In the previous example, we looked at join by tailum
But we can also perform join by multiple variables

flights3 <-left_join(flights, weather, by = c("year", "month", "day", "hour", "origin"))
flights3

# A tibble: 336,776 × 29
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      517            515         2      830            819
 2  2013     1     1      533            529         4      850            830
 3  2013     1     1      542            540         2      923            850
 4  2013     1     1      544            545        -1     1004           1022
 5  2013     1     1      554            600        -6      812            837
 6  2013     1     1      554            558        -4      740            728
 7  2013     1     1      555            600        -5      913            854
 8  2013     1     1      557            600        -3      709            723
 9  2013     1     1      557            600        -3      838            846
10  2013     1     1      558            600        -2      753            745
# ℹ 336,766 more rows
# ℹ 21 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour.x <dttm>, temp <dbl>, dewp <dbl>,
#   humid <dbl>, wind_dir <dbl>, wind_speed <dbl>, wind_gust <dbl>,
#   precip <dbl>, pressure <dbl>, visib <dbl>, time_hour.y <dttm>

Filtering Joins

Filtering joins match observations in the same way as mutating joins
They affect observations, not variables
- semi_join(x, y) keeps all observations in x that have a match in y
- anti_join(x, y) drops all observations in x that have a match in y

The Semi-Join

Only the existence of a match is important
It doesn’t matter which observation is matched

The Anti-Join

It is the inverse of the semi-join
The anti-join keeps the rows that don’t have a match
They are useful for diagnosing join mismatches

Dates and Times

When working spatial data, it is often necessary to work with time dates and time series

In other words, spatial data may contain temporal information

Dates and Times

Some common used classes to repsent time in R include:

Times:

Date - To represent data at day-by-day level
POSIXct and POSIXlt to represent data at a second-by-second level

Time Series:

ts
zoo (with the zoo package)
xts (with the xts package)

Working with Date objects

The simplest data structure to represent time is Date

For example, we can turn an object from character to Date by using as.Date function

x = "2014-10-20"
x

[1] "2014-10-20"

class(x)

[1] "character"

x2 = as.Date(x)
x2

[1] "2014-10-20"

class(x2)

[1] "Date"

Working with Date objects

The character values are in the standard format known as ISO 8601 (YYYY-MM-DD)

The as.Date fuction works without additional arguments

When the character values are in a non-standard format, we need to specify the format definition

Working with Date objects

The following table lists the most commonly used symbols for specifying date in R.

Symbol	Example	Meaning
`%d`	“15”	Day
`%m`	“08”	Month, Numeric
`%b`	“Aug”	Month, 3-letter
`%B`	“August”	Month, full
`%y`	14	Year, 2-digit
`%Y`	2014	Year, 4-digit

Working with Date objects: Examples

For example, we need to specify the format for certain date types.

as.Date("07/Aug/12")
## Error in charToDate(x) : 
##  character string is not in a standard unambiguous format

To fix the error, we need to type:

as.Date("07/Aug/12", format = "%d/%b/%y")

Working with Date objects: Examples

For example, we need to specify the format for certain date types.

as.Date("2012-August-07")
## Error in charToDate(x) : 
##  character string is not in a standard unambiguous format

To fix the error, we need to type:

as.Date("2012-August-07", format = "%Y-%B-%d")

[1] "2012-08-07"

Obtaining “Parts” of a Date

format(as.Date("1955-11-30"), format = "%m/%Y")

[1] "11/1955"

format(as.Date("1955-11-30"), format = "%d")

[1] "30"

The following will not work

format("1955-11-30", format = "%m/%Y")

[1] "1955-11-30"

format("1955-11-30", format = "%d")

[1] "1955-11-30"

This is because the first two are Date objects while the last two are character objects

x1<-"1955-11-30"
class(x1)

[1] "character"

x2<-as.Date("1955-11-30")
class(x2)

[1] "Date"

Arithmetic operations with dates

Date objects act like numeric objects

We can thus perform:

conditional operations
addition and subtraction of/between dates
creating date sequences

Arithmetic operations with dates: subtraction

x2<-as.Date("1955-11-30")
x3<-as.Date("1955-11-01")

time_diff1<-x2-x3
time_diff1

Time difference of 29 days

Another example:

x2<-as.Date("1955-11-30")
x4<-as.Date("2023-12-20")

time_diff2<-x4-x2
time_diff2

Time difference of 24857 days

Arithmetic operations with dates: subtraction

If we look at time_diff1 and time_diff2, they are different classes

class(time_diff1)

[1] "difftime"

class(time_diff2)

[1] "difftime"

We can turn them to numeric classes and to specific units.

as.numeric(time_diff1, unit = "days")

[1] 29

as.numeric(time_diff2, unit = "weeks")

[1] 3551

Arithmetic operations with dates: sequences

We can also create a sequence of dates using seq.

For example, the following expression creates a sequence of dates.

x<-seq(from = as.Date("2018-10-14"), to = as.Date("2019-01-11"), by = 7)
x

 [1] "2018-10-14" "2018-10-21" "2018-10-28" "2018-11-04" "2018-11-11"
 [6] "2018-11-18" "2018-11-25" "2018-12-02" "2018-12-09" "2018-12-16"
[11] "2018-12-23" "2018-12-30" "2019-01-06"

Time Series: Subsestting

This is how we can subset our date vector.

x<-seq(from = as.Date("2018-10-14"), to = as.Date("2019-01-11"), by = 7)
x[x<"2019-01-06"]

 [1] "2018-10-14" "2018-10-21" "2018-10-28" "2018-11-04" "2018-11-11"
 [6] "2018-11-18" "2018-11-25" "2018-12-02" "2018-12-09" "2018-12-16"
[11] "2018-12-23" "2018-12-30"

Time Series: Consecutive Differences

The diff function can be used to create a vector of differences between consecutive elements

For example:

x = c(15, 8, 23, 24)
diff(x)

[1] -7 15  1

or

x<-seq(from = as.Date("2018-10-14"), to = as.Date("2019-01-11"), by = 7)
diff(x)

Time differences in days
 [1] 7 7 7 7 7 7 7 7 7 7 7 7

Time Series: Consecutive Differences

Note that the length of diff(x) is one element less that x

We don’t have the difference for the first (or last) element, depending how you look at it

length(x)

[1] 13

length(diff(x))

[1] 12

Exercise using Dates 1

Create the following dataframe:

library("tibble")
names <- c("Alex", "Joan", "Helen", "Mark", "Lydia")
birthdates <- c("1986-02-21", "1990-07-21", "1995-05-01", "1989-03-10", "1991-12-12")
df <- tibble(names = names, birthdates = birthdates)
df

# A tibble: 5 × 2
  names birthdates
  <chr> <chr>     
1 Alex  1986-02-21
2 Joan  1990-07-21
3 Helen 1995-05-01
4 Mark  1989-03-10
5 Lydia 1991-12-12

Exercise using Dates 1

Instructions

Define a function called calculate_age that accepts one argument: birthdate
Convert the birthdate string into a Date object using as.Date()
Calculate people’s age

# Add a new column 'age' to the dataframe using the function
df$age <- calculate_age(df$birthdates)

# Print the updated dataframe
print(df)

# A tibble: 5 × 3
  names birthdates   age
  <chr> <chr>      <dbl>
1 Alex  1986-02-21  37.9
2 Joan  1990-07-21  33.5
3 Helen 1995-05-01  28.8
4 Mark  1989-03-10  34.9
5 Lydia 1991-12-12  32.1

Exercise using Dates 1 ANSWER

# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}# Define the function to calculate age
calculate_age <- function(birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Get current date
    #current_date <- Sys.Date()
    current_date <- as.Date("2024-01-31")
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    return(ages)
}

Exercise using Dates 1 ANSWER

# Add a new column 'age' to the dataframe using the function
df$age <- calculate_age(df$birthdates)

# Print the updated dataframe
print(df)

# A tibble: 5 × 3
  names birthdates   age
  <chr> <chr>      <dbl>
1 Alex  1986-02-21  37.9
2 Joan  1990-07-21  33.5
3 Helen 1995-05-01  28.8
4 Mark  1989-03-10  34.9
5 Lydia 1991-12-12  32.1

Exercise using Dates 2

Instructions

Now, add a string column in which you have a sentence. For example: “Alex was born 21-02-1986. Alex is 37 years and 11 months old.”

The output should look like below

# Add a new column 'sentence' to the dataframe using the function
df$sentence <- calculate_age(df$names, df$birthdates)

print(df)

# A tibble: 5 × 4
  names birthdates   age sentence                                               
  <chr> <chr>      <dbl> <chr>                                                  
1 Alex  1986-02-21  37.9 "Alex was born on 1986-02-21. Alex is 37 years and 11 …
2 Joan  1990-07-21  33.5 "Joan was born on 1990-07-21. Joan is 33 years and 6 m…
3 Helen 1995-05-01  28.8 "Helen was born on 1995-05-01. Helen is 28 years and 9…
4 Mark  1989-03-10  34.9 "Mark was born on 1989-03-10. Mark is 34 years and 11 …
5 Lydia 1991-12-12  32.1 "Lydia was born on 1991-12-12. Lydia is 32 years and 2…

Exercise using Dates 2 ANSWER

Instructions

Now, add a string column in which you have a sentence. For example: “Alex was born 21-02-1986. Alex is 37 years and 11 months old.”

# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}# Define the function to calculate age and create the sentence
calculate_age <- function(names, birthdates) {
    # Convert birthdates to Date objects
    birthdates <- as.Date(birthdates)
    # Specify the desired current date
    current_date <- as.Date("2024-01-31")  # Make sure to use the correct date format
    # Calculate age
    ages <- as.numeric(difftime(current_date, birthdates, units = "days") / 365.25)
    # Adjust for leap years
    months<-round((ages-floor(ages))*12)
    # Create sentences
    sentences <- paste0(names, " was born on ", birthdates, ". ", names, " is ", floor(ages), " years and ", months," months old.", sep = " ")
    return(sentences)
}

Exercise using Dates 2 ANSWER

The output should look like below

# Add a new column 'sentence' to the dataframe using the function
df$sentence <- calculate_age(df$names, df$birthdates)

print(df)

# A tibble: 5 × 4
  names birthdates   age sentence                                               
  <chr> <chr>      <dbl> <chr>                                                  
1 Alex  1986-02-21  37.9 "Alex was born on 1986-02-21. Alex is 37 years and 11 …
2 Joan  1990-07-21  33.5 "Joan was born on 1990-07-21. Joan is 33 years and 6 m…
3 Helen 1995-05-01  28.8 "Helen was born on 1995-05-01. Helen is 28 years and 9…
4 Mark  1989-03-10  34.9 "Mark was born on 1989-03-10. Mark is 34 years and 11 …
5 Lydia 1991-12-12  32.1 "Lydia was born on 1991-12-12. Lydia is 32 years and 2…

L6: Relational Data, Dates and Joins