Lab 2: Statistics

Lists, Numbers, Strings, Dataframes, Subsetting

In this lab, we are doing the basics in R. We learn how to remove elements from list, how to work with dataframes, how to load csv files, how to merge data, and how to produce a scatter plot.
Lists
Dataframe
Merging Data
Author

Bogdan G. Popescu

Instructions

  • Download the *.qmd file from here
  • Place it in the relevant folder “week2”
  • Open the *.qmd file

Question 1

Extract the word “literature” from the following vector and create a new object

list_fields <- c("politics", "philosophy", "literature", "chemistry")
new_list <- list_fields[list_fields!="literature"]
new_list <- list_fields[list_fields!="chemistry"]
new_list
[1] "politics"   "philosophy" "literature"
new_list <- list_fields[list_fields=="philosophy"]
new_list
[1] "philosophy"
new_list <- list_fields[list_fields=="philosophy" | list_fields=="chemistry"]
new_list
[1] "philosophy" "chemistry" 
length(new_list)
[1] 2

Question 2

Remove from the following list all the numbers between 13 and 15 (inclusive)

list_no<-c(11, 12, 13, 14, 15, 16, 17)
new_list_no <- list_no[list_no <= 12 | list_no >= 16]
new_list_no
[1] 11 12 16 17
new_list_no2 <- list_no[list_no != 13 & list_no != 14 & list_no != 15]
new_list_no2
[1] 11 12 16 17
filtered_list <- list_no[!(list_no >= 13 & list_no <= 15)]
filtered_list
[1] 11 12 16 17

Question 3

Remove from the following list, “word”, “sentence”, “books” using the %in% operator.

list_words<-c("random", "word", "sentence", "books")

Question 4

Explain in your words: Why does the following “four” > “five” return TRUE

"four" > "five"
[1] TRUE

Question 5

Remove the missing data from the following list

list_no <- c(1, 2, 3, 4, NA, 5, 6)

Question 6

Calculate the mean and median from the following list of numbers

list_no<-c(30, 12, NA, 14, NA)

Question 7

Create the following dataframe

df <- data.frame(student=c('Alex', 'Jane', 'Tom', 'Lilly', 'Turner', 'Ruby', 'Nick'),
                 grade=c(77, 81, 89, 83, 99, 92, 97))
df

Question 8

Create a new dataframe with only Alex, Jane and Turner

Question 9

What is the average grade for the entire class

Question 10

What is the mean of Alex, Jane, and Turner’s grades

Question 11

What is the highest grade?

Question 12

What is the lowest grade?

Question 13

Who is student with the highest grade?

Question 14

Who is the student with the lowest grade?

Question 15

Extract the second element from the following list

list_new<-c("el1", "el2", "el3")

Question 16

Extract the last element from the following list

list_new<-c("el1", "el2", "el3")

Question 17

Merging Datsets

  1. Load the life_expectancy and urbanization datasets
#Removing previous datasets in memory
rm(list = ls())
#Setting path
setwd("/Users/bgpopescu/Dropbox/john_cabot/teaching/stats/week2/lab")
life_expectancy_df <- read.csv(file = './data/life-expectancy.csv')
urbanization_df <- read.csv(file = './data/share-of-population-urban.csv')
  1. Calculating mean by country for life_expectancy
  1. Remove the countries which have missing continents
  1. Calculating mean by country for urbanization
  1. Remove the countries which have missing continents
  1. Perform a left merge
  1. Removing NA values
  1. Creating a scatterplot