Dataframes, Lists, External Files, Paths
.csv and .xlsxPress CMD + A or Ctrl + A and then Press Delete
Then type:
df is a dataframe
student and grade are variables within that dataframe
This is how we select specific observations
This is how we can calculate average for the two variables from the two dataframes
dfsubset_dfThis is how we can calculate average for the two variables from the two dataframes
dfsubset_dfThis is how we can calculate average for the two variables from the two dataframes
dfsubset_dfThis is how we can calculate average for the two variables from the two dataframes
dfsubset_dfThis is how we can calculate average for the two variables from the two dataframes
df has two variables: student and gradeThis is how we can calculate average for the two variables from the two dataframes
df has two variables: student and gradesubset_df has two variables: student and gradeThis is how we calculate the mean
This is how we identify Max & Min
This is how we work with indexing lists
We can easily remove everything from your computer’s memory with the following command:
Notice the difference before and after
Notice the difference before and after
.csv, .xlsx, .txt, or .tsv files| File Type | Description | R Function | 
|---|---|---|
.csv | 
Comma-separated values | read.csv() | 
.tsv | 
Tab-separated values | read.delim() | 
.txt | 
Generic text file | read.table() | 
.xlsx | 
Excel spreadsheet | readxl::read_excel() | 
Download the following datasets from Dropbox:
Now put them in your working directory
Place it in a folder called “data” under the work directory (e.g. “week2/lab/” below)
To open the file add a new chunk and type
This is what you should see
This is what you should see
This is what you should see.
The part in red will differ from computer to computer
Notice how the path reflects your folder structure
Notice how the path reflects your folder structure
Notice how the path reflects your folder structure
We can now work with relative paths
Remember this?
We can now work with relative paths
Remember this?
We can now work with relative paths
Remember this?
We can now work with relative paths
Remember this?
We can now work with relative paths
Remember this?
This is how we read the csv file.
This is how we read the csv file.
We can now also load the other dataframe
Notice the difference between relative paths vs. absolute paths
Relative Paths
One common error is the following
One common error is the following
If you get that error, your path is not correct
Go back to the previous steps and identify the path to your file.
Let us now investigate our two datasets
Let us now investigate our two datasets
Let us now investigate our two datasets
Let us now investigate our two datasets
This is how we can examine the first five entries
You should see:
This is how we can examine the first five entries
You should see:
This is how you install a packages in R
Notice that you need to have quotes: "tidyverse"
Once you are done:
install.packages("tidyverse")If you don’t delete it or comment it out, it will cause errors during rendering.
Once you install the package, it will always be on your machine.
As we progress, you might be using commands, that might result in:
To use the commands associated with the package, you need to load it
You will need to load this package to use its functions
glimpseWe will examine our data using glimpse
Rows: 20,445
Columns: 4
$ Entity                                <chr> "Afghanistan", "Afghanistan", "A…
$ Code                                  <chr> "AFG", "AFG", "AFG", "AFG", "AFG…
$ Year                                  <int> 1950, 1951, 1952, 1953, 1954, 19…
$ Life.expectancy.at.birth..historical. <dbl> 27.7, 28.0, 28.4, 28.9, 29.2, 29…
or
Rows: 20,445
Columns: 4
$ Entity                                <chr> "Afghanistan", "Afghanistan", "A…
$ Code                                  <chr> "AFG", "AFG", "AFG", "AFG", "AFG…
$ Year                                  <int> 1950, 1951, 1952, 1953, 1954, 19…
$ Life.expectancy.at.birth..historical. <dbl> 27.7, 28.0, 28.4, 28.9, 29.2, 29…
glimpseWe will examine our data using glimpse
Rows: 20,445
Columns: 4
$ Entity                                <chr> "Afghanistan", "Afghanistan", "A…
$ Code                                  <chr> "AFG", "AFG", "AFG", "AFG", "AFG…
$ Year                                  <int> 1950, 1951, 1952, 1953, 1954, 19…
$ Life.expectancy.at.birth..historical. <dbl> 27.7, 28.0, 28.4, 28.9, 29.2, 29…
We have four variables within our dataframe:
Entity: string or character variableCode: string or character variableYear: numeric or integer variableLife.expectancy.at.birth..historical.: numeric or double precision variableglimpseWe will examine our data using glimpse
Rows: 20,445
Columns: 4
$ Entity                                <chr> "Afghanistan", "Afghanistan", "A…
$ Code                                  <chr> "AFG", "AFG", "AFG", "AFG", "AFG…
$ Year                                  <int> 1950, 1951, 1952, 1953, 1954, 19…
$ Life.expectancy.at.birth..historical. <dbl> 27.7, 28.0, 28.4, 28.9, 29.2, 29…
We have four variables within our dataframe:
Entity: the country: “Afghanistan”, “Albania”, “Algeria”, etc.Code: the country code: “AFG”, “ALB”, “DZA”, etcYear: year 1950, 1951, 1952, etc.Life.expectancy.at.birth..historical.: life expectancy corresponding to that year.csv and .xlsxPopescu (JCU): Dataframes, Lists, External Files, Paths