Skip to content

aamtaa/python-spark-streaming

 
 

Repository files navigation

Python Spark Streaming

Overview

Project source code for James Lee's Aparch Spark with Python (Pyspark) course.

Description

Tools like spark are incredibly useful for processing data that is continuously appended. The python bindings for Pyspark not only allow you to do that, but also allow you to combine spark streaming with other Python tools for Data Science and Machine learning. This course goes through some of the basics of using Apache Spark, as well as more advanced concepts like accumulators, combining Pyspark with Apache Kafka, using Pyspark with AWS tools like Kinesis, streaming data from sources like Twitter, and how to get the most out of the Structured Streaming paradigm in the recently-released Spark 2.3.0.

This course is a one-stop-shop for all your pyspark streaming education needs.

What's in this Repo?

In this repo are the notebooks, data files, exercise files, and everything else you need to learn how to use the streaming capabilities of Pyspark.

More content like this

Check out the full list of DevOps and Big Data courses that James and Tao teach here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.4%
  • Python 0.6%