I'm a Data Engineer with a passion for exploring and applying new technologies.
I enjoy researching innovative solutions and incorporating them into my projects to improve data processes and infrastructure.
- Languages: Python, SQL
- Big Data: PySpark, Spark, Databricks
- Cloud: AWS, GCP
- Containerization: Docker
- Orchestration: Airflow
I primarily work on building data pipelines and ETLs, extracting data from various sources, and processing it through all stages of a data lake. My expertise includes:
- Data Pipeline/ETL Creation: Designing and implementing efficient data pipelines to move and transform data across various systems.
- Lakehouse Architecture: Building data solutions using the Lakehouse architecture, integrating the best of data lakes and data warehouses for efficient and scalable data storage and analytics.
- Complex Process Orchestration: Managing the orchestration of complex workflows using tools like Airflow to ensure smooth and efficient execution of multi-step data processes.
- Optimization & Performance: Continuously improving the performance and optimization of data processes within the data lake, ensuring faster and more efficient data retrieval and processing.
- Data Quality: Ensuring high standards of data quality through validation and monitoring processes.
- CI/CD for Pipelines: Setting up and deploying data pipelines using automated CI/CD workflows.