forked from hadley/r4ds
-
Notifications
You must be signed in to change notification settings - Fork 0
/
wrangle.Rmd
43 lines (26 loc) · 2.32 KB
/
wrangle.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# (PART) Wrangle {.unnumbered}
# Introduction {#wrangle-intro}
In this part of the book, you'll learn about data wrangling, the art of getting your data into R in a useful form for visualisation and modelling.
Data wrangling is very important: without it you can't work with your own data!
There are three main parts to data wrangling:
```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/data-science-wrangle.png")
```
<!--# TO DO: Redo the diagram without highlighting import. -->
This part of the book proceeds as follows:
- In Chapter \@ref(tibbles), you'll learn about the variant of the data frame that we use in this book: the **tibble**.
You'll learn what makes them different from regular data frames, and how you can construct them "by hand".
- In Chapter \@ref(tidy-data), you'll learn about tidy data, a consistent way of storing your data that makes transformation, visualisation, and modelling easier.
You'll learn the underlying principles, and how to get your data into a tidy form.
- In Chapter \@ref(rectangle-data), you'll learn about hierarchical data formats and how to turn them into rectangular data via unnesting.
- Chapter \@ref(column-wise-operations) will give you tools for performing the same operation on multiple columns.
- Chapter \@ref(row-wise-operations) will give you tools for performing operations over rows.
Data wrangling also encompasses data transformation, which you've already learned a little about.
Now we'll focus on new skills for three specific types of data you will frequently encounter in practice:
- Chapter \@ref(relational-data) will give you tools for working with multiple interrelated datasets.
- Chapter \@ref(list-columns) will give you tools for working with list columns --- data stored in columns of a tibble as lists.
- Chapter \@ref(strings) will give you tools for working with strings and introduce regular expressions, a powerful tool for manipulating strings.
- Chapter \@ref(factors) will introduce factors --- how R stores categorical data.
They are used when a variable has a fixed set of possible values, or when you want to use a non-alphabetical ordering of a string.
- Chapter \@ref(dates-and-times) will give you the key tools for working with dates and date-times.
<!--# TO DO: Revisit bullet points about new chapters. -->