CS 441- Engineering Distributed Objects for Cloud Computing

NAME: JEET MEHTA

UIN: 668581235

Homework 2 - Log File Generator

Overview

The objective of this homework was to process the log generator file using Hadoop Map Reduce Framework. This will help in parallel processing of the data.

Instructions

My Environment

The project was developed using the following environment

Windows OS
IDE: IntelliJ IDEA 2021
VMWare Workstation 16 Pro
Hortonworks 3.0.1 Sandbox

Pre-requisite

Java 1.8 needs to be installed on the system
Setup the HDP Sandbox
SBT needs to be installed on your system

Demo of how to deploy the map-reduce jobs on the AWS EMR

Click on this link to see how to deploy your Map Reduce on AWS EMR

Working of the Map Reduce:

We start off by creating a log generator dataset. We have created a log file consisting of 50,000 log messages. This will be used to implement all the four jobs to be performed. We then perform tasks for all the four functionalities mentioned.

MapReduce Jobs

Job 1:

Mapper Class: Mapper_Job1

Reducer Class: Reducer_Job1

Goal: To show different messages(ERROR, DEBUG, INFO, WARN) across predefined time intervals along with their string instances of the designated regex pattern.
Job 2:

Mapper Class: Mapper_Job2

Reducer Class: Reducer_Job2

Goal: The message of type ERROR is to be displayed in the descending order of its time interval having the strings instances of the designated string pattern.
Job 3:

Mapper Class: Mapper_Job3

Reducer Class: Reducer_Job3

Goal: Compute aggregation of the messages produced. For eg. (ERROR, 16), (INFO,22)
Job 4:

Mapper Class: Mapper_Job4

Mapper Class: Reducer_Job4

Goal: For each of the message type we have to compute the total number of characters it's string instances has which are found in the designated Regex pattern.

Running these jobs

Clone this repo onto your system
Open command line of your OS and browse to project directory
Build using(In the Intellij terminal or cmd in Windows):

sbt clean compile assembly
Using VSCode open the folder of this jar file and click on Go Live.
Start VMWare Workstation Pro
Run using:

hadoop jar jarname.jar inp_dir out_dir

OUTPUT:

This is the sample output that I have received for each of the jobs.

Output for Map_Reduce Job1:

14610=DEBUG,1

14611=INFO,1

14611=WARN,1

14612=DEBUG,1

14614=INFO,1

14614=WARN,1

14615=ERROR,1

14617=ERROR,1

14617=INFO,1

14617=WARN,1

Output for Map_Reduce Job2:

7,14618

7,14669

7,14886

7,14881

7,14848

7,14761

7,14641

4,14740

4,14671

4,14692

Output for Map_Reduce Job3:

DEBUG=10,8737

ERROR=10,843

Output for Map_Reduce Job4:

INFO,10

WARN,10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS 441- Engineering Distributed Objects for Cloud Computing

NAME: JEET MEHTA

UIN: 668581235

Homework 2 - Log File Generator

Overview

Instructions

My Environment

Pre-requisite

Demo of how to deploy the map-reduce jobs on the AWS EMR

Working of the Map Reduce:

MapReduce Jobs

Running these jobs

OUTPUT:

Output for Map_Reduce Job1:

Output for Map_Reduce Job2:

Output for Map_Reduce Job3:

Output for Map_Reduce Job4:

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS 441- Engineering Distributed Objects for Cloud Computing

NAME: JEET MEHTA

UIN: 668581235

Homework 2 - Log File Generator

Overview

Instructions

My Environment

Pre-requisite

Demo of how to deploy the map-reduce jobs on the AWS EMR

Working of the Map Reduce:

MapReduce Jobs

Running these jobs

OUTPUT:

Output for Map_Reduce Job1:

Output for Map_Reduce Job2:

Output for Map_Reduce Job3:

Output for Map_Reduce Job4: