🫀 Data Modelling in Postgres

This project is part of the Udacity Nanodegree® on Data Engineering.

It aims to demonstrate database creation from a JSON store by creating appropriate database structures and populating correct data post ETL.

🧘🏽 “Data that is loved tends to survive.” - Kurt Bollacker

Core Technologies

Usage

Start by creating the database using,

python create_tables.py

Start the ETL process using,

python etl.py

The data will then be loaded to the tables created and can be tested using the test.ipynb notebook present in the repository.

Database Architecture

Created using Creatly. Sorry for the attribute stacking in the SONGPLAYS table, creately only allows for 60 elements in their free version!

The schema followed for this database is a star schema.

ARTIST_ID is a duplicate column in SONGS and ARTISTS; this makes a more read oriented structure that is always prefered in analytics. Since the objective is to optimise analytical workflow, this structure supports our goal.

All tables have been ensured to have a PRIMARY KEY. Column datatypes and sizes have been marked as per observed business data. LATITUDE and LONGITUDE have been capped till 5 digits accuracy after the decimal point to ensure ~cm accuracy. (Source)

File Dictionary

create_tables.py - Helps in resetting/ creating the database and the tables within.
sql_queries.py - Queries to support resetting/ creating/ dropping and 'upsertion' of tables and data.
etl.py - Performs the reading of JSON files, type conversions where necessary and pumping into Postgres.
etl.ipynb - Interactive version of etl.py
test.ipynb - Notebook to check if the data was correctly loaded.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🫀 Data Modelling in Postgres

Core Technologies

Usage

Database Architecture

File Dictionary

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
data		data
.DS_Store		.DS_Store
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

asonthalia/Sparkify-Data-Modelling-With-Postgres

Folders and files

Latest commit

History

Repository files navigation

🫀 Data Modelling in Postgres

Core Technologies

Usage

Database Architecture

File Dictionary

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages