big-data-processing

Star

Here are 68 public repositories matching this topic...

zaid-24 / Crack-Detection-using-CNN

Star

Crack Detection model using yolov7

python cnn pytorch big-data-processing yolov7

Updated Jul 2, 2023
Jupyter Notebook

john-fotis / Movie-Recommender

Star

A movie recommender written in Go that suggests movies considering various factors within a particular dataset, encompassing users, movies, and movie ratings.

go golang big-data web-application recommender-system cosine-similarity cli-application jaccard-similarity movie-recommendation-system pearson-correlation dice-coefficient corellation big-data-processing

Updated Apr 21, 2024
Go

vishu-tyagi / BigQuery-ELT

Star

BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP

python docker bigquery airflow spark terraform pyspark dbt elt batch-processing big-data-analytics etl-pipeline big-data-processing elt-pipeline

Updated Feb 6, 2023
Python

Turnipdo / Docker-Spark-Setup

Star

Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.

setup spark docker-container big-data-processing

Updated Jun 21, 2024
Python

mikhail-kukuyev / Masters-Degree-Courses

Star

Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".

machine-learning information-retrieval highload mpi neural-networks external-memory university-course python-course randomized-algorithms cache-optimization page-rank big-data-processing

Updated Jan 13, 2019
Python

jpmorgen / BigMultiPipe

Star

"Provides tools for parallel pipeline processing of large data structures

multiprocessing pipelines big-data-processing

Updated Dec 26, 2022
Python

ScratchyCode / Computer-Vision

Star

Software basati su metodi di intelligenza artificiale per l'automazione dell'analisi di big data.

python opencv machine-learning data-mining deep-neural-networks video computer-vision tensorflow transfer-learning big-data-processing phototrap

Updated May 30, 2022
Python

Neri-kun / Licenta

Star

Degree diploma project

machine-learning recommender-systems big-data-analytics big-data-processing

Updated Jul 6, 2024
Jupyter Notebook

siddharths067 / Easy-Airflow-Deployment

Star

A Docker Compose Template to deploy Airflow with sync from a remote repository

data-science big-data etl apache-airflow big-data-analytics big-data-processing

Updated Aug 30, 2020
Shell

ChristopherLiew / chris-liew-technical-blog

Star

Tech blog / notes from my various endeavours and exploits

blog data gitbook notes data-engineering software-engineering big-data-processing

Updated Dec 31, 2022

Faisal-AlDhuwayhi / Data-Lake

Star

Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark

aws sql big-data spark amazon-emr pyspark data-engineering data-lake cloud-computing amazon-s3 etl-pipeline big-data-processing

Updated Dec 23, 2022
Python

Rifat392000 / BigDataAnalytics

Star

visualization sql clustering eclipse virtual-machine python3 rdbms hue hadoop-filesystem hadoop-mapreduce cloudera-hadoop pyspark-notebook big-data-analytics java-mapreduce big-data-processing google-colab-notebook

Updated Jan 17, 2024
Jupyter Notebook

theGuyWithBlackTie / electricChargingStations

Star

big-data electric-vehicles spark-ml charging-stations big-data-processing

Updated Dec 13, 2021
Jupyter Notebook

matthewdowns / the-biggest-data

Star

Experiment to record as much data as possible in a given amount of time using a distributed timeseries database.

big-data timeseries distributed-database distributed-storage timeseries-database timeseries-data timescale timescaledb big-data-processing

Updated Jan 23, 2023
Shell

IncredibleProgress / sweetheart.py

Star

rock-solid pillars for enterprise-grade solutions

python vue jupyter ubuntu rethinkdb rhel rust-lang nginx-unit tailwindcss big-data-processing py-script

Updated Feb 5, 2024
Python

bdnf / BigData-Engineering-Projects

Star

Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow

airflow spark cassandra data-warehouse data-lake redshift big-data-analytics big-data-processing

Updated Feb 28, 2020
Jupyter Notebook

Lefteris-Souflas / Redis-MongoDB-Assignment

Star

Analyzing classified ads data from the used motorcycles market. Tasks involve utilizing Redis Bitmaps for analytics on seller actions and MongoDB for analyzing bike listings. Includes data installation, cleaning, and analysis.

redis json r bitmap mongo-database big-data-processing redis-vs-rdbms-comparison

Updated Apr 17, 2024
R

franck-mahieu / datasets-toolbox

Star

datasets-toolbox are some scripts usefull to generate, transfom and valid large dataset files, not openable with editor because too large. datasets-toolbox provide also a ping script.

json json-data ping transform-data toolbox json-parsing jsonlines jsonl big-data-processing ping-launch