Skip to content

Commit

Permalink
Tidy chapter updates
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Jul 25, 2016
1 parent f1cc208 commit a2ff3ec
Show file tree
Hide file tree
Showing 4 changed files with 382 additions and 250 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Imports:
tidyr
Remotes:
hadley/modelr,
hadley/tidyr,
hadley/readr,
hadley/stringr,
rstudio/bookdown,
Expand Down
32 changes: 32 additions & 0 deletions tibble.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,17 @@ tibble(x = 1:5, y = 1, z = x ^ 2 + y)

`tibble()` automatically recycles inputs of length 1, and you can refer to variables that you just created. Compared to `data.frame()`, `tibble()` does much less: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row names.

It's possible for a tibble to have column names that are not valid R variables, or __non-syntactic__ names. For example, they might not start with a letter, or they might contain unusual values like a space. To refer to these variables, you need to surround them with backticks, `` ` ``:

```{r}
tb <- tibble(
`:)` = "smile",
` ` = "space",
`2000` = "number"
)
tb
```

Another way to create a tibble is with `frame_data()`, which is customised for data entry in R code. Column headings are defined by formulas (`~`), and entries are separated by commas:

```{r}
Expand All @@ -48,6 +59,21 @@ frame_data(

1. What does `enframe()` do? When might you use it?

1. Practice referring to non-syntactic names by:

1. Plotting a scatterplot of `1` vs `2`.

1. Creating a new column called `3` which is `2` divided by `1`.

1. Renaming the columns to `one`, `two` and `three`.

```{r}
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
```
## Tibbles vs. data frames
There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting.
Expand Down Expand Up @@ -84,6 +110,12 @@ nycflights13::flights %>%

You can see a complete list of options by looking at the package help: `package?tibble`.

Remember, you can also get a nicer view of the data set using RStudio's built-in data viewer. This is often useful at the end of a long chain of manipulations.

```{r, eval = FALSE}
nycflights13::flights %>% View()
```

### Subsetting

Tibbles are stricter about subsetting. If you try to access a variable that does not exist, you'll get a warning. Unlike data frames, tibbles do not use partial matching on column names:
Expand Down
Loading

0 comments on commit a2ff3ec

Please sign in to comment.