Skip to content

This course presents to the students recent research and industrial issues pertaining to data engineering, database systems and technologies. Various topics of interests that are directly or indirectly affecting or are being influenced by data engineering, database systems and technologies are explored and discussed.

Notifications You must be signed in to change notification settings

drshahizan/special-topic-data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

Don't forget to hit the ⭐ if you like this repo.

Special Topic Data Engineering

Course Synopsis

This course presents to the students recent research and industrial issues pertaining to data engineering, database systems and technologies. Various topics of interests that are directly or indirectly affecting or are being influenced by data engineering, database systems and technologies are explored and discussed. Participation in forums as well as face to face interaction, with researchers and practitioners on these topics are encouraged. Students should then be able to conduct their own investigation and deductions. This course will also expose students to industry’s experiences in managing database systems and technologies through sharing knowledge sessions and work based learning activities with selected organization.

🔥 Important things ⚡

  1. Course Information
  2. Task 1: Additional Notes

Project

Notes

No Module Description Notes
1 Data Engineer, Data Engineering, Data Science, Data Scientist A data engineer focuses on designing, building, and maintaining data systems and infrastructure. Data engineering involves the processes and tools used to transform and store data. Data science involves analyzing and interpreting data to extract insights, while a data scientist applies statistical and machine learning techniques for predictive modeling and decision-making.
2 Application Programming Interface (API) An Application Programming Interface (API) is a set of rules and protocols that allows different software applications to communicate and interact with each other. It defines the methods, data formats, and authentication mechanisms for accessing and manipulating functionality or data provided by a service or platform.
3 Data Scraping Data scraping refers to the automated extraction of data from websites or online sources. It involves using specialized tools or scripts to gather and retrieve specific information, such as text, images, or structured data, for analysis or other purposes.
4 Django Django is a free and open-source web framework for building web applications using the Python programming language. It follows a Model-View-Controller (MVC) architecture and provides an easy-to-use Object-Relational Mapping (ORM) layer for interacting with databases. Django comes with a wide range of built-in features, including authentication, URL routing, template rendering, and more, which makes it a popular choice for building scalable and maintainable web applications.
5 Data Integration Data integration in data science refers to the process of combining and merging data from multiple sources into a unified and consistent format. It involves transforming, cleaning, and integrating diverse data to create a comprehensive dataset for analysis and modeling purposes.
6 Types of Data & NoSQL Database Types of data include structured, semi-structured, and unstructured data. MongoDB is a popular NoSQL database that uses a flexible document model, allowing for storage and retrieval of data in JSON-like documents, making it suitable for handling diverse data types and flexible schema designs.
7 Data Wrangling Data wrangling, also known as data cleaning or data preprocessing, is the process of cleaning, transforming, and preparing raw data for analysis. This involves identifying and addressing issues such as missing or inconsistent data, formatting errors, and duplicates. Data wrangling tools automate this process and can be used to streamline data cleaning and preparation tasks. Some popular data wrangling tools include OpenRefine, Trifacta, DataWrangler, KNIME, and Talend. These tools provide a range of features and capabilities, such as the ability to handle large datasets, automate data cleaning tasks, and visualize data for exploration and analysis. Data wrangling is an essential step in the data analysis process, as it helps to ensure that the data is accurate, consistent, and relevant for analysis.
8 Feature Engineering Feature engineering is the process of selecting, creating, and transforming variables (or features) in a dataset to improve the performance of a machine learning model. This involves identifying relevant variables, transforming variables to make them more useful, and creating new variables that capture important information. Feature engineering tools automate this process and can be used to streamline feature selection and creation tasks. Some popular feature engineering tools include Featuretools, tpot, AutoML, and H2O.ai. These tools provide a range of features and capabilities, such as the ability to automate feature selection and creation, identify important variables, and optimize feature pipelines for machine learning models. Feature engineering is an important step in the machine learning process, as it helps to ensure that the model is able to learn from relevant data and make accurate predictions.
9 Artificial Intelligence vs Machine Learning vs Deep Learning Artificial Intelligence, Machine Learning, and Deep Learning are all related to the field of computer science and are focused on enabling computers to learn and make decisions based on data. Artificial intelligence involves building systems that can perform tasks that typically require human intelligence, such as language understanding, decision making, and problem-solving. Machine learning is a subset of AI that focuses on building algorithms that can learn patterns and make decisions based on data without being explicitly programmed. Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to learn and identify patterns in data. Overall, all three fields involve leveraging data to build intelligent systems that can learn from experience and make decisions based on that learning.
10 Visualization Data visualization is the process of representing data graphically to help people understand and make sense of complex data. Visualization tools in data science allow users to create visual representations of data, such as charts, graphs, and maps, that can be easily interpreted and analyzed. Some popular data visualization tools include Tableau, Power BI, Google Data Studio, and D3.js. These tools provide a range of features and capabilities, such as the ability to create interactive dashboards, explore data in real-time, and collaborate with others on visualizations. Data visualization is an important part of the data analysis process, as it helps to uncover patterns, trends, and insights in the data that might not be apparent from raw data alone.
11 AWS AWS Academy is a global program that offers educational institutions a comprehensive curriculum focused on cloud computing using Amazon Web Services (AWS). The program provides institutions with access to up-to-date learning materials, hands-on labs, and assessments to teach students about various cloud computing topics.

Visual Studio Code

Video

Useful Links

Tools

Diagrams are visual representations of information or data that help convey complex concepts, processes or systems in a clear and concise manner. Flowcharts are diagrams that use shapes and arrows to illustrate the steps in a process or algorithm [More info...].

No Tools File
1 Figma
2 Draw.io
3 Github Pages
4 Behance
5 Visual Studio Code
6 Bootstrap Studio

Submission

No Topic File
1 Proposal
2 Application Programming Interface (API)
3 Data Scraping
4 Django
5 MongoDB
6 AWS Certification
7 Final Project

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors

About

This course presents to the students recent research and industrial issues pertaining to data engineering, database systems and technologies. Various topics of interests that are directly or indirectly affecting or are being influenced by data engineering, database systems and technologies are explored and discussed.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published