The aim is to understand and apply statistical methods to analyze data and draw meaningful conclusions, accompanied by visualizations.
Continuous Distribution:
Implement exercises from a reference to explore continuous probability distributions in Python. Answer all associated questions and explain the results with clarity. Normal Distribution:
Analyze the normal distribution using provided lap times data. Implement the task in Python, addressing all questions without manual sketches. Present results with appropriate graphs and interpretations. Central Limit Theorem (CLT):
Implement exercises related to CLT in Python. Answer all associated questions, focusing on demonstrating how sample distributions approximate normality as the sample size increases. Melbourne Real-Estate Dataset Exploration:
Perform three creative data analysis tasks beyond simple statistics like mean and standard deviation. Examples include: Analyzing the relationship between the distance from the CBD and property prices. Investigating house sizes versus prices. Examining suburbs by the number of properties and their price distributions. Analyzing home construction trends by years and councils. Exploring price points and house sizes across regions or sellers. Present findings with relevant plots and detailed explanations.
The focus is to understand and implement gradient descent for optimizing parameters in a linear model, using the Melbourne Real-Estate Dataset.
Linear Regression Model:
Use a simple linear regression model 𝑦=𝑚𝑥+𝑐 where 𝑚 is the slope and 𝑐 is the intercept. Analyze the relationship between the number of bedrooms and property prices for a specific suburb (e.g., Caulfield North) and visualize it. Gradient Descent Implementation:
Implement GDA in Python to optimize 𝑚 and 𝑐. Explore: The effect of different learning rates on convergence and parameter values. The cost function behavior during optimization. Visualization:
Plot the relationship between bedrooms and prices using the optimized model. Include graphs showing the best-fit line and learning rate effects.
The aim is to understand and implement one of three classic supervised ML algorithms using specific datasets and APIs. The focus is on practical implementation, clear explanation, and analysis of model outcomes with visualizations. The options include:
Naive Bayes:
Use scikit-learn to implement the Naive Bayes classifier on the UCI wine dataset. Understand and apply Naive Bayes principles, including any step-by-step implementations suggested in resources. Output results in a detailed manner with plots and interpretation. Decision Trees:
Use scikit-learn to implement Decision Trees on the Diabetes dataset. Explore how Decision Trees work, including potential manual implementation. Clearly present outcomes with visualizations for interpretation. K-Nearest Neighbors (KNN):
Use scikit-learn to implement KNN on the UCI wine dataset. Follow provided code and explanations to understand the model's functionality. Evaluate and visualize results.
The goal is to explore modern ML techniques, focusing on either Artificial Neural Networks (ANN) or clustering methods. Practical implementation and analysis with visualizations remain key. The options include:
Artificial Neural Networks:
Implement ANN using Keras on: Zillow’s Home Value Prediction dataset. UCI wine dataset. Understand the neural network's structure, operation, and predictive capabilities. Present results with relevant visualizations and explanations. Clustering:
Study and implement DBScan clustering, experimenting with: Spend Score vs Annual Income data, adjusting epsilon and min-points parameters to observe their effect. AWS Cloud Watch time-series data to detect anomalies, specifically analyzing EC2 CPU utilization patterns. Focus on discovering patterns or anomalies and discussing parameter tuning's impact on results.