The dataframe once you load it into python looks like the following:
Python
import osimport pandas as pd# Step 0: Setting the working directoryos.chdir("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/text_analysis/week6/lecture6b/data")#Step 1: Loading the datalife_exp_urb = pd.read_csv('./life_exp_urb.csv')# Step 2: Examining the first five entrieslife_exp_urb.head(3)
In principle, we can just present basic information about the data:
Python
# Step 1: Subset the DataFramedf = life_exp_urb[['life_exp_mean', 'urb_mean']].copy()# Rename the columnsdf.columns = ['life_exp', 'urb']# Display the first 10 entriesprint(df.head(10))
“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency.[…] [It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space… And graphical excellence requires telling the truth about the data.”
Edward Tufte, The Visual Display of Quantitative Information, p. 51
Bad Visualizations 1
Bad Visualizations 1
The figure has way too many categories
Bad Visualizations 2
Bad Visualizations 2
The categories in a pie chart should add up to 100%
The areas should correspond to the percentage size
Bad Visualizations 3
Bad Visualizations 3
The categories in a pie chart should add up to 100%
Bad Visualizations 4
Bad Visualizations 4
Bad Visualizations 5
Bad Visualizations 5
Goals of Visualization
Python has a variety of libraries meant for data visualization:
matplotlib
seaborn
plotnine
Plotnine is useful as it allows direct iteraction with ggplot2, a library native to R.
ggplot2 allows to create powerful visualizations and has better data handling abilities.