-
Notifications
You must be signed in to change notification settings - Fork 17
/
README
135 lines (90 loc) · 5.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
Using ElasticSearch Flume integration
Pre-Conditions:
* have Flume installed, or at least cloned from the Flume git repo,
if not, go here http://github.com/cloudera/flume , and build it (currently using 'ant', but follow their docs).
From here on, this Flume directory will be referred to as FLUME_HOME
* Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local
ElasticSearch server running locally, if not go here http://github.com/elasticsearch/elasticsearch
Getting Started with elasticflume
0. First, setup some environment variables to your local paths, to make the following steps simpler:
export FLUME_HOME=<path to where you have Flume checkedout/installed>
export ELASTICSEARCH_HOME=<path to where you have ElasticSearch checked out>
export ELASTICFLUME_HOME=path to where you have elasticflume checked out>
(Be careful with these last 2 env vars because they are deceivingly similar)
1. Build it using Maven:
1.1 Install the Flume library into your local Maven repo (because it's not available in central)
Note: the below assumes you have done a 'git clone' of the Flume source, and have built it.
mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar
1.2 Build elasticflume
cd $ELASTICFLUME_HOME
mvn package
2. Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :):
ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/
3. Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you
may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following:
<property>
<name>flume.master.servers</name>
<value>localhost</value>
<description>A comma-separated list of hostnames, one for each
machine in the Flume Master.
</description>
</property>
... (the above may not be necessary, because it's the default, but I had to do it for some reason).
You will also need to register the elasticflume plugin via creating a new a property block:
<property>
<name>flume.plugin.classes</name>
<value>org.elasticsearch.flume.ElasticSearchSink</value>
<description>Comma separated list of plugins</description>
</property>
4. Startup Flume Master, and Flume nodes, you will need 2 different shells here.
cd $FLUME_HOME
bin/flume master
VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3:
2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink
bin/flume node_nowatch
5. Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message)
cd $FLUME_HOME
bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'"
NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not
'localhost' which it is often. If things are not working properly, you should check by:
bin/flume shell -c localhost -e "getnodestatus"
If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with
a logical name by doing this:
bin/flume shell -c localhost -e "map <IP ADDRESS> localhost"
6. NOW FOR THE TEST! :) In the console window you started the "node_nowatch" above,
type (and yes, straight after all those log messages, just start typing, trust me..):
hello world
hello there good sir
(ie. that is, type the 2 lines ensuring you press return after each)
7. Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node:
curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d '
{
"query" : {
"term" : { "message" : "hello" }
}
}
'
You should get a pretty printed JSON formatted search results, something like:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.1976817,
"hits" : [ {
"_index" : "flume",
"_type" : "LOG",
"_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680",
"_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"192.168.1.170","priority":"INFO"}
}, {
"_index" : "flume",
"_type" : "LOG",
"_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5",
"_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"192.168.1.170","priority":"INFO"}
} ]
}
}
8. Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.