Lab 5: Statistics

The Normal Distribution

Author

Bogdan G. Popescu

#Removing previous datasets in memory
rm(list = ls())
#Loading the relevant libraries
#install.packages("gghighlight")
library(ggplot2)
library(gridExtra)
library(gghighlight)

Before we get into coding, let us go back to a short explanation of how frequncies relate to probability mass functions (pmf) and cummulative distribution functions (cdf).

There are four functions in R for calculating normal distributions: rnorm, pnorm, qnorm, dnorm.

1 Generating Random Values in a Standard Normal Distribution (mean: 0, sd:1)

The rnorm function generates a vector of normally distributed random numbers. We can specify the mean \(\mu\) and the standard deviation \(\sigma\). Remember that that for a standard normal distribution the mean \(\mu = 0\) and the standard deviation \(\sigma = 1\).

Let us see how it works by creating three samples with 10, 100, and 1000 observations.

1.1 Creating a sample of 10 observations

#Set seed allows us to make sure everyone gets the same results
set.seed(123)
#Step1: Creating 10 obs. with mean 0 and sd of 1
generated_data_10obs<-rnorm(10, mean=0, sd=1)
#Step2: Turning the created data into a data frame
generated_data_10obs<-data.frame(generated_data_10obs)
#Step3: Inspecting the first 10 obs.
head(generated_data_10obs, n=10)
#Step4: Renaming variable inside the datafarme
names(generated_data_10obs)<-"value"
head(generated_data_10obs, n=10)
#Step5: Plotting the data
figure_1<-ggplot(data = generated_data_10obs, aes(x=value))+
  geom_histogram()+
  theme_bw()+
  ggtitle("10 Obs.")
figure_1