Based on a MapReduce project (clean up for general use WIP)
Mainly for personal use to quickly set up the project without copying over files repeatedly
Run the install.sh script
The install script will do the following:
- Install kubectl
- Install protobuf
- Install Kind
- Install Helm
- Install gRPC
- Set up conservator
- Install etcd/ZooKeeper
Let us know if you have any issues.
CMakeLists.txt will help you build your code. We recommend you read into cmake to understand how it works and how to use it to build your services.
- Create kind cluster
./init.sh
- Compile sources, prepare docker image, load it into kind:
./build.sh
- Start the k8s cluster with master and workers, 2 nodes each:
./cluster.sh up
- get master logs to inspect activity between nodes
3.1. Find pod named mr-master-xxxx:
kubectl get pods -n ws1
3.2. Get lods:kubectl logs [mr-master-xxx] -n ws1
, add flag-f
to stream logs
We have 3 containers for the inputs (mapreduce), intermediary files from the map phase and the output container for the final results from the reduce phase. Once we have set of inputs blobs in the mapreduce container, the master will take those blobs, take their sizes and produce list of Shard that consist of ShardFragment. Each ShardFragment points to a blob and the start and end offsets in it. This means that a shard can span multiple blobs, in the cases where the sharding cannot evenly split a single blob into the predefined shard size.
Once the sharding is done, we create the Map and Reduce Jobs that will present a workload for the workers. First we run the map phase where we try to complete all MapJobs, along the way we feed the outputs from the map jobs to the ReduceJobs as inputs. Next we start the reduce phase on the available workers.
Once the reduce jobs finish, the results are stored in the outputs container in azure.
curl "10.244.0.9:50049?M=2&R=4&files=lorem.txt&mr_functions=mr_functions.py"
Alternatively,
kubectl port-forward svc/mr-proxy -n ws1 8080:8080
curl "http://localhost:8080?M=2&R=4&files=lorem.txt&mr_functions=mr_functions.py"
Newest version
python3.7 user_client.py -d folder -m mapper.py -r reducer.py -M 10 -R 3
For -d option, don't include the '/' at the end, the script will get all files in that dir and upload it