Back to: Introduction to R
Previously we have worked with data in the form of
Vectors
Lists
Arrays
Dataframes
What is a Tibble????
“Tibbles” are a new modern data frame. It keeps many important features of the original data frame. It removes many of the outdated features. They are another amazing feature added to R by Hadley Wickham. We will use them in the tidyverse to replace the older outdated dataframe that we just learned about.
Compared to Data Frames
A tibble never changes the input type.
No more worry of characters being automatically turned into strings.
A tibble can have columns that are lists.
A tibble can have non-standard variable names.
can start with a number or contain spaces.
To use this refer to these in a backtick.
It only recycles vectors of length 1.
It never creates row names.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.3.2
## Warning: package 'ggplot2' was built under R version 3.3.2
## Warning: package 'tidyr' was built under R version 3.3.2
try <- tibble(x = 1:3, y = list(1:5, 1:10, 1:20))
try
## # A tibble: 3 × 2
## x y
## <int> <list>
## 1 1 <int [5]>
## 2 2 <int [10]>
## 3 3 <int [20]>
We can see that y
is displayed as a list. If we try to do this with a traditional data frame we get:
try <- as_data_frame(c(x = 1:3, y = list(1:5, 1:10, 1:20)))
try
Error: Variables must be length 1 or 20. Problem variables: 'y1', 'y2'
We can use a non standard name in our Tibble as well:
names(data.frame(`crazy name` = 1))
## [1] "crazy.name"
names(tibble(`crazy name` = 1))
## [1] "crazy name"
Notice that the dataframe replaced the name that we wanted because it could not handle a space being in the name.
Coercing into Tibbles
A tibble can be made by coercing as_tibble()
. This works similar to as.data.frame()
. It is a very efficient process though.
l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters
microbenchmark::microbenchmark(
as_tibble(l),
as.data.frame(l)
)
## Unit: microseconds
## expr min lq mean median uq max
## as_tibble(l) 309.250 327.099 376.2002 344.7265 386.004 1689.046
## as.data.frame(l) 1390.507 1464.361 1614.3087 1543.3465 1690.608 3104.097
## neval cld
## 100 a
## 100 b
Microbenchmarking is a way to calculate the average times spent on an object. You can see how much faster it is to create a tibble than a dataframe. This will make a large difference in a data analysis.
Tibbles vs Data Frames
There are a couple key differences between tibbles and data frames.
Printing.
Subsetting.
Printing
Tibbles only print the first 10 rows and all the columns that fit on a screen. – Each column displays its data type.
You will not accidentally print too much.
tibble(
a = lubridate::now() + runif(1e3) * 86400,
b = lubridate::today() + runif(1e3) * 30,
c = 1:1e3,
d = runif(1e3),
e = sample(letters, 1e3, replace = TRUE)
)
## # A tibble: 1,000 × 5
## a b c d e
## <dttm> <date> <int> <dbl> <chr>
## 1 2017-02-19 09:02:23 2017-03-09 1 0.02150370 f
## 2 2017-02-19 01:42:10 2017-03-09 2 0.08031493 k
## 3 2017-02-19 05:36:59 2017-03-08 3 0.11670172 u
## 4 2017-02-19 18:49:56 2017-03-09 4 0.24552337 h
## 5 2017-02-19 04:15:06 2017-03-05 5 0.11232662 b
## 6 2017-02-19 10:00:27 2017-03-09 6 0.52834632 m
## 7 2017-02-19 13:42:43 2017-03-16 7 0.78928491 v
## 8 2017-02-19 17:02:27 2017-03-16 8 0.80388276 h
## 9 2017-02-19 15:09:33 2017-03-19 9 0.45767339 d
## 10 2017-02-19 09:14:04 2017-02-25 10 0.18177950 t
## # ... with 990 more rows
Subsetting
We can index a tibble in the manners we are used to
df$x
df[["x"]]
df[[1]]
We can also use a pipe
which we will learn about later.
df %>% .$x
df %>% .[["x"]]
df <- tibble(
x = runif(5),
y = rnorm(5)
)
df$x
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df[["x"]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df[[1]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
The above commands should seem very familiar after the previous work but wit the piping
or chaining
we can do the same:
df %>% .$x
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df %>% .[["x"]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df %>% .[[1]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486