Back to: Introduction to R
We will discuss a concept that will help us greatly when it comes to working with our data. The usual way to perform multiple operations in one line is by nesting.
To consider an example we will look at the data provided in the gapminder package:
library(gapminder)
head(gapminder)
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
Let’s say that we want to have the GDP per capita and life expectancy Kenya. Traditionally speaking we could do this in a nested manner:
filter(select(gapminder, country, lifeExp, gdpPercap), country=="Kenya")
It is not easy to see exactly what this code was doing but we can write this in a manner that follows our logic much better. The code below represents how to do this with chaining.
gapminder %>%
select(country, lifeExp, gdpPercap) %>%
filter(country=="Kenya")
We now have something that is much clearer to read. Here is what our chaining command says:
1. Take the gapminder
data
2. Select the variables: country
, lifeExp
and gdpPercap
.
3. Only keep information from Kenya.
The nested code says the same thing but it is hard to see what is going on if you have not been coding for very long. The result of this search is below:
## # A tibble: 12 × 3
## country lifeExp gdpPercap
## <fctr> <dbl> <dbl>
## 1 Kenya 42.270 853.5409
## 2 Kenya 44.686 944.4383
## 3 Kenya 47.949 896.9664
## 4 Kenya 50.654 1056.7365
## 5 Kenya 53.559 1222.3600
## 6 Kenya 56.155 1267.6132
## 7 Kenya 58.766 1348.2258
## 8 Kenya 59.339 1361.9369
## 9 Kenya 59.285 1341.9217
## 10 Kenya 54.407 1360.4850
## 11 Kenya 50.992 1287.5147
## 12 Kenya 54.110 1463.2493
What is %>%
In the previous code we saw that we used %>%
in the command you can think of this as saying then. For example:
gapminder %>%
select(country, lifeExp, gdpPercap) %>%
filter(country=="Kenya")
This translates to:
Take Gapminder then select these columns select(country, lifeExp, gdpPercap) then filter out so we only keep Kenya
Why Chain?
We still might ask why we would want to do this. Chaining increases readability significantly when there are many commands. With many pacakges we can replace the need to perform nested arguments. The chaining operator is automatically imported from the magrittr package.
User Defined Function
Let’s say that we wish to find the Euclidean distance between two vectors say, x1
and x2
. We could use the math formula:
In the nested manner this would be:
x1 <- 1:5; x2 <- 2:6
sqrt(sum((x1-x2)^2))
However, if we chain this we can see how we would perform this mathematically.
# chaining method
(x1-x2)^2 %>% sum() %>% sqrt()
If we did it by hand we would perform elementwise subtraction of x2
from x1
then we would sum those elementwise values then we would take the square root of the sum.
# chaining method
(x1-x2)^2 %>% sum() %>% sqrt()
## [1] 2.236068
Many of us have been performing calculations by this type of method for years, so that chaining really is more natural for us.