Basketball-Data-Analysis-using-Databricks

Project Overview

This project demonstrates data engineering tasks using basketball data within a Databricks environment. The main goals are to process raw data, cleanse it, and analyze it using SQL to gain insights into player demographics across different teams.

Technologies Used

Databricks Workspace
PySpark
SQL

Notebooks Overview

1. Data Processing Notebook

Purpose: Handle initial data processing tasks including reading raw data from a CSV file, performing data cleansing, and writing the cleaned data to a Parquet file.

Steps:

1. Reading Data:

Import basketball data from a CSV file.

2. Data Cleansing:

Rename columns for better readability and consistency.
Replace blank or null values with appropriate placeholders or default values.
Add new columns that may be required for further analysis.

3. Writing Data:

Write the processed data to a Parquet file format for efficient querying and storage.

2. SQL Analysis Notebook

Purpose: Create a table from the Parquet file generated in the first notebook and perform SQL queries to extract specific insights.

Steps:

1. Creating Table:

Create a table in Databricks using the Parquet file as the data source.

2. Solving Analytical Problems Using SQL:

Query 1: View the most aged players from each team.
Query 2: Display players from each team with a height greater than 6 feet.

Setup and Execution

Prerequisites

Databricks environment setup
Required libraries installed (e.g., pandas, pyspark, etc.)
Access to the basketball CSV data file

Steps to Run

1. Data Processing Notebook:

Upload the basketball CSV data file to Databricks.
Execute the notebook to perform data cleansing and generate the Parquet file.
Verify the output Parquet file in the Databricks file system.

2. SQL Analysis Notebook:

Execute the notebook to create a table from the Parquet file.
Run the provided SQL queries to extract insights and verify the results.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Databricks Exports		Databricks Exports
Data_Analysis_using_SQL.py		Data_Analysis_using_SQL.py
Data_Processing.py		Data_Processing.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basketball-Data-Analysis-using-Databricks

Project Overview

Technologies Used

Notebooks Overview

1. Data Processing Notebook

Steps:

1. Reading Data:

2. Data Cleansing:

3. Writing Data:

2. SQL Analysis Notebook

Steps:

1. Creating Table:

2. Solving Analytical Problems Using SQL:

Setup and Execution

Prerequisites

Steps to Run

1. Data Processing Notebook:

2. SQL Analysis Notebook:

About

Releases

Packages

Languages

sachin413/Basketball-Data-Analysis-using-Databricks

Folders and files

Latest commit

History

Repository files navigation

Basketball-Data-Analysis-using-Databricks

Project Overview

Technologies Used

Notebooks Overview

1. Data Processing Notebook

Steps:

1. Reading Data:

2. Data Cleansing:

3. Writing Data:

2. SQL Analysis Notebook

Steps:

1. Creating Table:

2. Solving Analytical Problems Using SQL:

Setup and Execution

Prerequisites

Steps to Run

1. Data Processing Notebook:

2. SQL Analysis Notebook:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages