Separate


The third tidyr function we will look into is the separate() function. With separate() it will make sense with what we are doing. We are going to take one column that has a lot of information in it. We will then separate out that information into multiple other columns.

The above picture displays this idea. We can consider table3 :

## # A tibble: 6 × 3
##       country  year              rate
##        <fctr> <int>             <chr>
## 1 Afghanistan  1999      745/19987071
## 2 Afghanistan  2000     2666/20595360
## 3      Brazil  1999   37737/172006362
## 4      Brazil  2000   80488/174504898
## 5       China  1999 212258/1272915272
## 6       China  2000 213766/1280428583

Notice that we do not actually have a rate in the rate column. What we have is the number of cases divided by the total population. It would be nice if we could have a column for cases and one for population. We can use the separate() function to do this:

separate(data,col, into, sep)

where

data is the data frame of interest.

col is the column that needs to be separated.

into is a vector of names of columns for the data to be separated into to.

sep is the value where you want to separate the data at.

separate() Example

In our example here we would do the following: `

table3 %>%
    separate(rate, c("cases", "population"), sep="/")
## # A tibble: 6 × 4
##       country  year  cases population
## *      <fctr> <int>  <chr>      <chr>
## 1 Afghanistan  1999    745   19987071
## 2 Afghanistan  2000   2666   20595360
## 3      Brazil  1999  37737  172006362
## 4      Brazil  2000  80488  174504898
## 5       China  1999 212258 1272915272
## 6       China  2000 213766 1280428583

You can see that now instead of having cases/population, we now have a separate column for cases and population. This means we could now create a rate column:

table <- table3 %>%
              separate(rate, c("cases", "population"), sep="/")
table$cases <- as.numeric(table$cases)
table$population <- as.numeric(table$population)
table$rate <- table$cases/table$population
head(table,2)
## # A tibble: 2 × 5
##       country  year cases population         rate
##        <fctr> <int> <dbl>      <dbl>        <dbl>
## 1 Afghanistan  1999   745   19987071 0.0000372741
## 2 Afghanistan  2000  2666   20595360 0.0001294466