layout | root | permalink |
---|---|---|
lesson |
. |
index.html |
So you have a new data set. Before you dive into running models and tests, you need to inspect your data. John Tukey, a prominent statistician, coined the term "exploratory data analysis". Data exploration can inform a number of decisions:
- what methods are appropriate to use on your data
- whether the data satisfy certain modeling assumptions
- whether the data needs to be cleaned, reshaped, reduced, etc.
In this lesson, we begin with a messy version of the Gapminder data and explore it together. We will find some issues with the data and teach you how to correct them. After making the data tidy, you will be able to plot the variables in different ways and see patterns.
Some experience with Python is helpful, but not strictly needed. {: .prereq}
Data Exploration | Tidying, summarzing, and plotting data | Lesson narrative | Student notebook |