Back to: Introduction to R
The third tidyr
function we will look into is the separate()
function. With separate()
it will make sense with what we are doing. We are going to take one column that has a lot of information in it. We will then separate out that information into multiple other columns.
The above picture displays this idea. We can consider table3
:
## # A tibble: 6 × 3
## country year rate
## <fctr> <int> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
Notice that we do not actually have a rate in the rate
column. What we have is the number of cases divided by the total population. It would be nice if we could have a column for cases
and one for population
. We can use the separate()
function to do this:
separate(data,col, into, sep)
where
data
is the data frame of interest.
col
is the column that needs to be separated.
into
is a vector of names of columns for the data to be separated into to.
sep
is the value where you want to separate the data at.
separate()
Example
In our example here we would do the following: `
table3 %>%
separate(rate, c("cases", "population"), sep="/")
## # A tibble: 6 × 4
## country year cases population
## * <fctr> <int> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
You can see that now instead of having cases/population
, we now have a separate column for cases and population. This means we could now create a rate column:
table <- table3 %>%
separate(rate, c("cases", "population"), sep="/")
table$cases <- as.numeric(table$cases)
table$population <- as.numeric(table$population)
table$rate <- table$cases/table$population
head(table,2)
## # A tibble: 2 × 5
## country year cases population rate
## <fctr> <int> <dbl> <dbl> <dbl>
## 1 Afghanistan 1999 745 19987071 0.0000372741
## 2 Afghanistan 2000 2666 20595360 0.0001294466