forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
facet.Rmd
256 lines (189 loc) · 11.6 KB
/
facet.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# Facetting {#facet}
```{r, include = FALSE}
source("common.R")
```
You first encountered facetting in Section \@ref(qplot-facetting). Facetting generates small multiples each showing a different subset of the data. Small multiples are a powerful tool for exploratory data analysis: you can rapidly compare patterns in different parts of the data and see whether they are the same or different. This section will discuss how you can fine-tune facets, particularly the way in which they interact with position scales. \index{Facetting} \index{Positioning!facetting}
There are three types of facetting:
* `facet_null()`: a single plot, the default. \indexf{facet\_null}
* `facet_wrap()`: "wraps" a 1d ribbon of panels into 2d.
* `facet_grid()`: produces a 2d grid of panels defined by variables which
form the rows and columns.
The differences between `facet_wrap()` and `facet_grid()` are illustrated in Figure \@ref(fig:facet-sketch).
```{r facet-sketch, echo = FALSE, out.width = "75%", fig.cap="A sketch illustrating the difference between the two facetting systems. `facet_grid()` (left) is fundamentally 2d, being made up of two independent components. `facet_wrap()` (right) is 1d, but wrapped into 2d to save space."}
knitr::include_graphics("diagrams/position-facets.png", dpi = 300, auto_pdf = TRUE)
```
Faceted plots have the capability to fill up a lot of space, so for this chapter we will use a subset of the mpg dataset that has a manageable number of levels: three cylinders (4, 6, 8), two types of drive train (4 and f), and six classes.
```{r mpg2}
mpg2 <- subset(mpg, cyl != 5 & drv %in% c("4", "f") & class != "2seater")
```
## Facet wrap {#facet-wrap}
`facet_wrap()` makes a long ribbon of panels (generated by any number of variables) and wraps it into 2d. This is useful if you have a single variable with many levels and want to arrange the plots in a more space efficient manner. \index{Facetting!wrapped} \indexf{facet\_wrap} \indexc{\textasciitilde}
You can control how the ribbon is wrapped into a grid with `ncol`, `nrow`, `as.table` and `dir`. `ncol` and `nrow` control how many columns and rows (you only need to set one). `as.table` controls whether the facets are laid out like a table (`TRUE`), with highest values at the bottom-right, or a plot (`FALSE`), with the highest values at the top-right. `dir` controls the direction of wrap: **h**orizontal or **v**ertical.
`r columns(2, 2/3)`
```{r}
base <- ggplot(mpg2, aes(displ, hwy)) +
geom_blank() +
xlab(NULL) +
ylab(NULL)
base + facet_wrap(~class, ncol = 3)
base + facet_wrap(~class, ncol = 3, as.table = FALSE)
```
```{r}
base + facet_wrap(~class, nrow = 3)
base + facet_wrap(~class, nrow = 3, dir = "v")
```
## Facet grid
`facet_grid()` lays out plots in a 2d grid, as defined by a formula: \index{Facetting!grid} \indexf{facet\_grid}
* `. ~ a` spreads the values of `a` across the columns. This direction
facilitates comparisons of y position, because the vertical scales are
aligned.
`r columns(1, 1 / 3, 0.75)`
```{r grid-v}
base + facet_grid(. ~ cyl)
```
* `b ~ .` spreads the values of `b` down the rows. This direction
facilitates comparison of x position because the horizontal scales are
aligned. This makes it particularly useful for comparing distributions.
`r columns(1, 3 / 2, 0.30)`
```{r mpg2-h}
base + facet_grid(drv ~ .)
```
* `a ~ b` spreads `a` across columns and `b` down rows. You'll usually
want to put the variable with the greatest number of levels in the columns,
to take advantage of the aspect ratio of your screen.
`r columns(1, 2 / 3, 0.75)`
```{r grid-vh}
base + facet_grid(drv ~ cyl)
```
You can use multiple variables in the rows or columns, by "adding" them together, e.g. `a + b ~ c + d`. Variables appearing together on the rows or columns are nested in the sense that only combinations that appear in the data will appear in the plot. Variables that are specified on rows and columns will be crossed: all combinations will be shown, including those that didn't appear in the original dataset: this may result in empty panels.
## Controlling scales {#controlling-scales}
For both `facet_wrap()` and `facet_grid()` you can control whether the position scales are the same in all panels (fixed) or allowed to vary between panels (free) with the `scales` parameter: \index{Facetting!interaction with scales} \index{Scales!interaction with facetting} \index{Facetting!controlling scales}
* `scales = "fixed"`: x and y scales are fixed across all panels.
* `scales = "free_x"`: the x scale is free, and the y scale is fixed.
* `scales = "free_y"`: the y scale is free, and the x scale is fixed.
* `scales = "free"`: x and y scales vary across panels.
`facet_grid()` imposes an additional constraint on the scales: all panels in a column must have the same x scale, and all panels in a row must have the same y scale. This is because each column shares an x axis, and each row shares a y axis.
Fixed scales make it easier to see patterns across panels; free scales make it easier to see patterns within panels.
`r columns(1, 1 / 2.5, 0.75)`
```{r fixed-vs-free}
p <- ggplot(mpg2, aes(cty, hwy)) +
geom_abline() +
geom_jitter(width = 0.1, height = 0.1)
p + facet_wrap(~cyl)
p + facet_wrap(~cyl, scales = "free")
```
Free scales are also useful when we want to display multiple time series that were measured on different scales. To do this, we first need to change from 'wide' to 'long' data, stacking the separate variables into a single column. An example of this is shown below with the long form of the `economics` data. \index{Data!economics\_long@\texttt{economics\_long}}
`r columns(1, 1.2 / 1, 0.75)`
```{r time}
economics_long
ggplot(economics_long, aes(date, value)) +
geom_line() +
facet_wrap(~variable, scales = "free_y", ncol = 1)
```
`facet_grid()` has an additional parameter called `space`, which takes the same values as `scales`. When space is "free", each column (or row) will have width (or height) proportional to the range of the scale for that column (or row). This makes the scaling equal across the whole plot: 1 cm on each panel maps to the same range of data. (This is somewhat analogous to the 'sliced' axis limits of lattice.) For example, if panel a had range 2 and panel b had range 4, one-third of the space would be given to a, and two-thirds to b. This is most useful for categorical scales, where we can assign space proportionally based on the number of levels in each facet, as illustrated below.
```{r discrete-free}
mpg2$model <- reorder(mpg2$model, mpg2$cty)
mpg2$manufacturer <- reorder(mpg2$manufacturer, -mpg2$cty)
ggplot(mpg2, aes(cty, model)) +
geom_point() +
facet_grid(manufacturer ~ ., scales = "free", space = "free") +
theme(strip.text.y = element_text(angle = 0))
```
## Missing facetting variables {#missing-facetting-columns}
If you are using facetting on a plot with multiple datasets, what happens when one of those datasets is missing the facetting variables? This situation commonly arises when you are adding contextual information that should be the same in all panels. For example, imagine you have a spatial display of disease faceted by gender. What happens when you add a map layer that does not contain the gender variable? Here ggplot will do what you expect: it will display the map in every facet: missing facetting variables are treated like they have all values. \index{Facetting!missing data}
Here's a simple example. Note how the single red point from `df2` appears in both panels.
`r columns(1, 1 / 2, 0.75)`
```{r}
df1 <- data.frame(x = 1:3, y = 1:3, gender = c("f", "f", "m"))
df2 <- data.frame(x = 2, y = 2)
ggplot(df1, aes(x, y)) +
geom_point(data = df2, colour = "red", size = 2) +
geom_point() +
facet_wrap(~gender)
```
This technique is particularly useful when you add annotations to make it easier to compare between facets, as shown in the next section.
## Grouping vs. facetting {#group-vs-facet}
Facetting is an alternative to using aesthetics (like colour, shape or size) to differentiate groups. Both techniques have strengths and weaknesses, based around the relative positions of the subsets. \index{Facetting!vs. grouping} \index{Grouping!vs. facetting} With facetting, each group is quite far apart in its own panel, and there is no overlap between the groups. This is good if the groups overlap a lot, but it does make small differences harder to see. When using aesthetics to differentiate groups, the groups are close together and may overlap, but small differences are easier to see.
`r columns(1, 2/3)`
```{r}
df <- data.frame(
x = rnorm(120, c(0, 2, 4)),
y = rnorm(120, c(1, 2, 1)),
z = letters[1:3]
)
ggplot(df, aes(x, y)) +
geom_point(aes(colour = z))
```
`r columns(1, 1 / 3, 1)`
```{r}
ggplot(df, aes(x, y)) +
geom_point() +
facet_wrap(~z)
```
Comparisons between facets often benefit from some thoughtful annotation. For
example, in this case we could show the mean of each group in every panel. To
do this we group and summarise the data using the dplyr package, which is
covered in R for Data Science at <https://r4ds.had.co.nz>. Note that we need
two "z" variables: one for the facets and one for the colours.
\index{Facetting!adding annotations}
```{r}
df_sum <- df %>%
group_by(z) %>%
summarise(x = mean(x), y = mean(y)) %>%
rename(z2 = z)
ggplot(df, aes(x, y)) +
geom_point() +
geom_point(data = df_sum, aes(colour = z2), size = 4) +
facet_wrap(~z)
```
Another useful technique is to put all the data in the background of each panel:
```{r}
df2 <- dplyr::select(df, -z)
ggplot(df, aes(x, y)) +
geom_point(data = df2, colour = "grey70") +
geom_point(aes(colour = z)) +
facet_wrap(~z)
```
## Continuous variables {#continuous-variables}
To facet continuous variables, you must first discretise them. ggplot2 provides three helper functions to do so: \index{Facetting!by continuous variables}
* Divide the data into `n` bins each of the same length: `cut_interval(x, n)`
\indexf{cut\_interval}
* Divide the data into bins of width `width`: `cut_width(x, width)`.
\indexf{cut\_width}
* Divide the data into n bins each containing (approximately) the same
number of points: `cut_number(x, n = 10)`. \indexf{cut\_number}
They are illustrated below:
`r columns(1, 1/4, 1)`
```{r discretising}
# Bins of width 1
mpg2$disp_w <- cut_width(mpg2$displ, 1)
# Six bins of equal length
mpg2$disp_i <- cut_interval(mpg2$displ, 6)
# Six bins containing equal numbers of points
mpg2$disp_n <- cut_number(mpg2$displ, 6)
plot <- ggplot(mpg2, aes(cty, hwy)) +
geom_point() +
labs(x = NULL, y = NULL)
plot + facet_wrap(~disp_w, nrow = 1)
plot + facet_wrap(~disp_i, nrow = 1)
plot + facet_wrap(~disp_n, nrow = 1)
```
Note that the facetting formula does not evaluate functions, so you must first create a new variable containing the discretised data.
## Exercises
1. Diamonds: display the distribution of price conditional
on cut and carat. Try facetting by cut and grouping by carat. Try
facetting by carat and grouping by cut. Which do you prefer?
1. Diamonds: compare the relationship between price and carat for
each colour. What makes it hard to compare the groups? Is grouping better
or facetting? If you use facetting, what annotation might you add to
make it easier to see the differences between panels?
1. Why is `facet_wrap()` generally more useful than `facet_grid()`?
1. Recreate the following plot. It facets `mpg2` by class, overlaying
a smooth curve fit to the full dataset.
`r columns(1, 2/3, 0.75)`
```{r, echo = FALSE}
ggplot(mpg2, aes(displ, hwy)) +
geom_smooth(data = select(mpg2, -class), se = FALSE) +
geom_point() +
facet_wrap(~class, nrow = 2)
```