Skip to content

Longwinter93/Apache_Spark_Structured_Streaming_Data_Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Spark_Structured_Streaming_Data_Projects


There is only one project is related to Spark Structured Streaming.
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine.
Spark Streaming is a processing engine to process data in real-time from sources and output data to external storage systems.


CSV files was read as files written in a directory as a stream of data.
Input source is treated as file source.
The final result of streaming data was saved in Streaming DataFrame.
Streaming DataFrame was aggregated and not aggregated.
Streaming DataFrame was saved as JSON and parquet files.
Output sinks are treated as parquet and json files (file sink).
Our file sink stores the output to parquet as well as json files.
Spark Streaming Engine proccesses these steps.

About

Streaming Data Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published