Back to: Introduction to R
There are many ways in which we can organize data. Some of these ways can make for easy data analysis. Others lead to a lot of frustration. This is where tidy data comes in. Tidy data is a concept from Hadley Wickham’s 2014 paper Tidy Data.
In the framework of tidy data every row is an observation, every column represents variables and every entry into the cells of the data frame are values. R for Data Science sums this up with the following graphic:
#table1
# A tibble: 6 × 4
country year cases population
<fctr> <int> <int> <int>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
#table2
# A tibble: 12 × 4
country year key value
<fctr> <int> <fctr> <int>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
#table3
# A tibble: 6 × 3
country year rate
<fctr> <int> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
From these above tables we can see that only Table 1 is actually tidy data. We will consider how we can create tidy data from the other 2 as well as some other examples as we move through this unit.
To start out with getting the Data Set ready we will use the package `tidyr` and then to start transforming and working with the data to model and graph it, we will use the `dplyr` packages, both of `tidyverse`.
tidyr
Functions
To start out with getting the Data Set ready we will use the package tidyr
and then to start transforming and working with the data to model and graph it, we will use the dplyr
packages, both of tidyverse
.
For the tidyr
package we will focus on the following 4 functions:
1. Gather
2. Spread
3. Separate
4. Unite
On Your Own: Swirl Practice
In order to learn R you must do R. Follow the steps below in your RStudio console:
1. Run this command to pick the course:
swirl()
You will be promted to choose a course. Type whatever number is in front of 02 Getting Data. This will then take you to a menu of lessons. For now we will just use lesson 6. Type 6 to choose Looking at Data then follow all the instructions until you are finished.
Once you are finished with the lesson come back to this course and continue.
// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });