Session 1: How to solve the Big Data problem Assignment – I
- Various Sources of Big Data • Archives • Documents • Media • Business Applications • Social Media • Public Web • Data Storage • Machine log data • Sensor data
- 3 V’s of Big Data • Volume • Variety • Velocity
- Horizontal Scaling and Vertical Scaling Horizontal Scaling Vertical Scaling • Horizontal scalability is the ability to increase capacity by connecting multiple hardware or software entities to work as a single logical unit. • Vertical scalability is to increase capacity by adding more memory or an additional CPU to a machine. • Example: Cassandra, MongoDB. • Example: Amazon RDS (Cloud version of MySQL) • Pros: Much cheaper than Scaling-up as it takes advantage of small systems. • Pros: It consumes less power on comparison with multiple servers and reduced software cost. • Cons: The licensing fees are more. • It has a bigger footprint in the Data Center. • Cons: Severe vendor lock-in. • The overall cost of implementing is really expensive.
- Need and Working of Hadoop • Need: The complexity of modern analytics needs is outstripping the available computing power of legacy systems. With its distributed processing, Hadoop can handle large volumes of structured and unstructured data more efficiently than the traditional enterprise data warehouse. Hadoop has a robust Apache community behind it that continues to contribute to its advancement. • Working: With the Hadoop Distributed File System the data is written once on the server and subsequently read and re-used many times thereafter. When contrasted with the repeated read/write actions of most other file systems it explains part of the speed with which the speed with which Hadoop operates.