Back to: Introduction to R
There is usually no way around needing a new variable in your data. For example, most medical studies have height and weight in them, however many times what a researcher is interested in using is Body Mass Index (BMI). We would need to add BMI in.
Using the tidyverse
we can add new variables in multiple ways
mutate()
transmute()
With mutate()
we have
mutate(.data, ...)
where
.data
is your tibble of interest.
...
is the name paired with an expression
Then with transmute()
we have:
transmute(.data, ...)
where
.data
is your tibble of interest.
...
is the name paired with an expression
Differences Between mutate()
and transmute()
There is only one major difference between mutate()
and transmutate
and that is what it keeps in your data.
mutate()
creates a new variable
It keeps all existing variables
transmute()
creates a new variable.
It only keeps the new variables
Example
Let’s say we wish to have a variable called speed. We want to basically do:
We can first do this with mutate()
:
flights %>%
select(flight, distance, air_time) %>%
mutate(speed = distance/air_time*60)
## # A tibble: 336,776 × 4
## flight distance air_time speed
## <int> <dbl> <dbl> <dbl>
## 1 1545 1400 227 370.0441
## 2 1714 1416 227 374.2731
## 3 1141 1089 160 408.3750
## 4 725 1576 183 516.7213
## 5 461 762 116 394.1379
## 6 1696 719 150 287.6000
## 7 507 1065 158 404.4304
## 8 5708 229 53 259.2453
## 9 79 944 140 404.5714
## 10 301 733 138 318.6957
## # ... with 336,766 more rows
Notice with mutate()
we kept all of the variables we selected and added speed to this. Now we can do the same with transmute()
:
flights %>%
select(flight, distance, air_time) %>%
transmute(speed = distance/air_time*60)
## # A tibble: 336,776 × 1
## speed
## <dbl>
## 1 370.0441
## 2 374.2731
## 3 408.3750
## 4 516.7213
## 5 394.1379
## 6 287.6000
## 7 404.4304
## 8 259.2453
## 9 404.5714
## 10 318.6957
## # ... with 336,766 more rows
In this example we have only kept speed.