Adding Variables


There is usually no way around needing a new variable in your data. For example, most medical studies have height and weight in them, however many times what a researcher is interested in using is Body Mass Index (BMI). We would need to add BMI in.

Using the tidyverse we can add new variables in multiple ways

mutate()

transmute()

With mutate() we have

mutate(.data, ...)

where

.data is your tibble of interest.

... is the name paired with an expression

Then with transmute() we have:

transmute(.data, ...)

where

.data is your tibble of interest.

... is the name paired with an expression

Differences Between mutate() and transmute()

There is only one major difference between mutate() and transmutate and that is what it keeps in your data.

mutate()

creates a new variable

It keeps all existing variables

transmute()

creates a new variable.

It only keeps the new variables

Example

Let’s say we wish to have a variable called speed. We want to basically do:

speed=distancetime60speed=distancetime∗60

We can first do this with mutate():

flights %>% 
  select(flight, distance, air_time) %>%
  mutate(speed = distance/air_time*60)
## # A tibble: 336,776 × 4
##    flight distance air_time    speed
##     <int>    <dbl>    <dbl>    <dbl>
## 1    1545     1400      227 370.0441
## 2    1714     1416      227 374.2731
## 3    1141     1089      160 408.3750
## 4     725     1576      183 516.7213
## 5     461      762      116 394.1379
## 6    1696      719      150 287.6000
## 7     507     1065      158 404.4304
## 8    5708      229       53 259.2453
## 9      79      944      140 404.5714
## 10    301      733      138 318.6957
## # ... with 336,766 more rows

Notice with mutate() we kept all of the variables we selected and added speed to this. Now we can do the same with transmute():

flights %>%
  select(flight, distance, air_time) %>%
  transmute(speed = distance/air_time*60)
## # A tibble: 336,776 × 1
##       speed
##       <dbl>
## 1  370.0441
## 2  374.2731
## 3  408.3750
## 4  516.7213
## 5  394.1379
## 6  287.6000
## 7  404.4304
## 8  259.2453
## 9  404.5714
## 10 318.6957
## # ... with 336,766 more rows

In this example we have only kept speed.