Skip to content

Vigneshwarankanagarathinam/session5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

How to Solve Big Data Problem

  1. Explain the main sources of Data flood. This flood of data is coming from many sources. •The New York stock exchange generates about 4-5 terabytes of data everyday. •Facebook hosts more than 240 billion photos, growing at 7 petabytes of data everyday. •Ancestory.com, the genealogy site stores around 10 petabytes of the data. •The Internet Archive stores around 18.5 petabytes of data. •The Large Hadron Collider near Geneva produces about 30 Petabytes of data every year.
  2. What is the difference between Data and Big Data. • Big Data: Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, curation, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions. • Data: In computing, data is information that has been translated into a form that is more convenient to move or process. In computer component interconnection and network communication, data is often distinguished from "control information," "control bits," and similar terms to identify the main content of a transmission unit. In telecommunications, data sometimes means digital-encoded information to distinguish it from analog-encoded information such as conventional telephone voice calls. Generally and in science, data is a gathered body of facts.
  3. What are main reasons behind Hadoop becoming the solution for Data explosion The following are the main reasons of Hadoop becoming the solution for data exploding • Scalable: Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. Unlike traditional relational database systems (RDBMS) that can't scale to process large amounts of data. • Cost effective: Hadoop also offers a cost effective storage solution for businesses' exploding data sets. The problem with traditional relational database management systems is extremely cost prohibitive to scale to such a degree in order to process such massive volumes of data. The raw data would be deleted, as it would be too cost-prohibitive to keep. Hadoop, on the other hand, is designed as a scale-out architecture that can affordably store all of a company's data for later use. The cost savings are staggering: instead of costing thousands to tens of thousands of pounds per terabyte, Hadoop offers computing and storage capabilities for hundreds of pounds per terabyte. • Flexible: Hadoop enables businesses to easily access new data sources and tap into different types of data (both structured and unstructured) to generate value from that data. In addition, Hadoop can be used for a wide variety of purposes, such as log processing, recommendation systems, data warehousing, and market campaign analysis and fraud detection. • Fast: Hadoop's unique storage method is based on a distributed file system that basically 'maps' data wherever it is located on a cluster. The tools for data processing are often on the same servers where the data is located, resulting in much faster data processing. If you're dealing with large volumes of unstructured data, Hadoop is able to efficiently process terabytes of data in just minutes, and petabytes in hours. • Resilient to failure: A key advantage of using Hadoop is its fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster, which means that in the event of failure, there is another copy available for use. When it comes to handling large data sets in a safe and cost-effective manner, Hadoop has the advantage over relational database management systems, and its value for any size business will continue to increase as unstructured data continues to grow.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published