forked from irskep/mrjob_course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
syllabus.txt
40 lines (33 loc) · 937 Bytes
/
syllabus.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Day 1:
* What is MapReduce (15m)
* What is mrjob (15m)
* Writing and running simple local jobs for testing (30m)
* E: wc
* E: word_freq_count
* mapper_init, mapper_final optimizations (15m)
* E: wc_optimized
* Writing multi-step jobs (20m)
* E: most_used_words
* Together: select, union, intersect, difference, groupby+aggregate
* Using different protocols (30m)
* E: common_friends
Homework: most_unique_review
Day 2:
* What is EMR (30m)
* Running on EMR (20m)
* E: run homework on EMR
* Passing command line arguments to tasks (30m)
* E: grep
* Fun detour: grep with subprocess (20m)
* E: subprocess_grep
* Joins (20m)
Homework: Get the user with the most consecutive daily checkins
Day 3:
* Including support files (20m)
* E: pgp
* When to [not] use MapReduce to solve a problem
* How to go about optimization
* Combiners
* E: wc_super_optimized
* Managing dependencies
* Tools for managing EMR jobs, pitfalls (job flow pools)