Visualization in R is about presenting data in an aesthetically pleasing way
More importantly, visualization is about presenting the data in a truthful way
One good way to tell if something is true or not is by using science
However, data visualization can affect perceptions of what is scientific
The scientist can consciosly or unconsciously make particular style choices to a graph that can affect such perceptions
Thus, perception of science can sometimes be impacted by aesthetic choices and the way content is presented
At the most basic level, we can present snippets of the data or basic information about the data
To demonstrate how we can present data, let us download: Life expectancy and Urbanization Data
The dataframe once you load it into python looks like the following:
import os
import pandas as pd
# Step 0: Setting the working directory
os.chdir("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/text_analysis/week6/lecture6b/data")
#Step1: Loading the data
life_exp_urb = pd.read_csv('./life_exp_urb.csv')
# Step 2: Examining the first five entries
life_exp_urb.head(3)
Entity life_exp_mean urb_mean type
0 Afghanistan 45.383333 18.611754 Everything Else
1 Albania 68.286111 40.444164 Everything Else
2 Algeria 57.530133 52.609213 Everything Else
In principle, we can just present basic information about the data:
# Step 1: Subset the DataFrame
df = life_exp_urb[['life_exp_mean', 'urb_mean']].copy()
# Rename the columns
df.columns = ['life_exp', 'urb']
# Display the first 10 entries
print(df.head(10))
life_exp urb
0 45.383333 18.611754
1 68.286111 40.444164
2 57.530133 52.609213
3 68.637500 79.762491
4 77.048611 87.043017
5 45.084658 37.539705
6 69.440278 NaN
7 71.209722 32.350656
8 65.408046 85.282148
9 67.156944 63.111394
All of these ways of examining the data are reasonable. There is a positive correlation between urbanization and life expectancy.
But the following scatterplot is more compelling.
Good visualizations are:
“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency.[…] [It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space… And graphical excellence requires telling the truth about the data.”
Edward Tufte, The Visual Display of Quantitative Information, p. 51