Skip to content
This repository has been archived by the owner on Aug 23, 2024. It is now read-only.

GranData/spark-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-poc

several use-cases for playing around with Apache Spark.

Trying Spark DataFrames

In this project there is a mini Proof of concept using Apache Spark Dataframes.

Contains to tests:

  1. JoinUsingPlainRdd, a join of a table (file gz of about 10mb) with itself, using plain RDDs.
  2. JoinUsingDataFramesMain, the same join, but using DataFrames instead of RDDs. -This uses spark-csv (a thirty party library) for import the file with its schema.

Observations: Join using Dataframes runs in less than half time the join using RDDs.

Come and see...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages