Assignment 2

Author

Your Name

Published

February 18, 2024

Instructions

Please submit your assignment as html file to me via email. Please call your assignment: “yourname_assignment2.html”. For example, I would submit submit something called “popescu_assignment2.html”. Please make sure that it is all in lower case. Please answer all the questions below

1 Question

Suppose you have a dataframe called students with three columns: name, age, and grade. Write a function called increase_grade that takes a dataframe and a percentage increase as input, and returns a new dataframe where the grade column has been increased by the specified percentage for all students.

# Sample dataframe
students <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(20, 21, 22),
  grade = c(80, 75, 100)
)

Assume that the the percentage increase is 10. Your output should look like below:

# Test the function
percentage_increase <- 10  # 10% increase in grades
students$new_grade <- increase_grade(students$grade, percentage_increase)
students
     name age grade new_grade
1   Alice  20    80      88.0
2     Bob  21    75      82.5
3 Charlie  22   100     110.0

2 Question

Suppose you have two datasets:

# Sample data for employee_data
employee_data <- data.frame(
  employee_id = c(101, 102, 103, 104, 105),
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  department = c("HR", "Finance", "Marketing", "IT", "Operations")
)
# Sample data for performance_data
performance_data <- data.frame(
  employee_id = c(101, 102, 103, 104, 105),
  rating = c(4.5, 3.2, 2.9, 4.8, 3.7),
  bonus = c(1000, 800, 500, 1200, 900)
)

Instructions:

Create a function employee_performance_analysis that takes the two datasets as inputs and performs the following tasks:

Step1: Merge employee_data with performance_data based on the employee_id.

Step2: Calculate a new column performance_grade based on the following criteria:

  • If rating is greater than or equal to 4, assign “High”.
  • If rating is between 3 and 4, assign “Medium”.
  • If rating is less than 3, assign “Low”.

Step3: Return the merged dataset with the added performance_grade column.

This is what your output should look like:

# Test the function
employee_performance <- employee_performance_analysis(employee_data, performance_data)
employee_performance
  employee_id    name department rating bonus performance_grade
1         101   Alice         HR    4.5  1000              High
2         102     Bob    Finance    3.2   800            Medium
3         103 Charlie  Marketing    2.9   500               Low
4         104   David         IT    4.8  1200              High
5         105     Eve Operations    3.7   900            Medium

3 Question

Create a function in R that calculates the number of days until the next holiday based on the current date. Here’s the exercise:

Step1: You have the following dataframe:

dates_df <- data.frame(
  date_interest = c("2024-02-07", 
                    "2024-03-10", 
                    "2024-04-01")
)

Step2: Define a function called days_until_holiday that takes one argument - a character string representing the current date in the format “YYYY-MM-DD”.

Step3: Inside the function, create the following vector of holidays

holidays <- c("New Year's Day" = "2024-01-01",
              "Easter" = "2024-04-21",
              "Christmas" = "2024-12-25")

Step4: Calculate the difference in days between the date of interest and the next holiday and return the number of days until the next holiday.

Your output should look like:

dates_df$days_until_next_holiday<-days_until_holiday(dates_df$date_interest)
dates_df
  date_interest days_until_next_holiday
1    2024-02-07                      74
2    2024-03-10                      42
3    2024-04-01                      20

4 Question

Now add a new colum in which you list what holiday that is

dates_df$holiday<-name_holiday(dates_df$date_interest)
dates_df
  date_interest days_until_next_holiday holiday
1    2024-02-07                      74  Easter
2    2024-03-10                      42  Easter
3    2024-04-01                      20  Easter

5 Question

Load the Life expectancy and Urbanization Data

Create a scatterplot showing the relationship between life expectancy and urbanization in 1970

Your output should look like this:

6 Question

Now do the same for 2000