Skip to content

Contains data cleaning, visualization and modelling, incl. ML-based predictions of happiness levels in the next 2 years

License

Notifications You must be signed in to change notification settings

cyrilby/world-happiness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

World happiness insights

This project consists of data collection & cleaning, visualization and modelling, including generating based predictions of happiness levels in the next 2 years using a machine learning (ML) model. The project was done using R code and RMarkdown notebooks, with short descriptions of each individual Rmd file provided below.

Custom_functions.R

Contains a serious of R functions developed specifically for use in this project. These functions facilitate data prep, dimensions reduction, ridge regression analysis as well as building and using an ML-based model to generate predictions for future happiness.

Data_prep.Rmd

The purpose of this file is to import, clean up and prepare data from multiple source for the analysis of world happiness data. Running this file is a prerequisite for being able to make some proper data visualizations and for doing statistics. Some of the data may still need to be downloaded from the respective sources (please see the Data/Raw/ folder).

Dealing_with_missing_data.Rmd

In this file, we deal with missing data by performing an imputation based on a random forest model. For each missing value, 10 possible values are generated, of which the average is kept and used for further analysis. Some columns containing too many missing values may be dropped entirely and therefore not used for analytical purposes.

Happinness_around_the_world.Rmd

In this file, we take a look at the current state of happiness as well as what happiness looked like in the past and how much development has taken place since the base year. We look at both world happiness as well as at happiness by country and region. Finally, we explore whether happiness is rather stable or rather dynamic.

Happinness_global_correlations.Rmd

In this file, we explore the correlations between happiness and different economic, political, societal, environmental and health-related factors. We do this both for the most recent year in the data and for all available years in the data. Finally, we explore explore how the correlations have evolved throughout time.

Dimensions_reduction.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values). After that, we try to construct a simplified set of variables that reflects the overall categories, i.e. economic, political, social, environmental and health-related factors. We use principal component analysis (PCA) as our preferred dimension reduction technique.

Ridge_regression_analysis.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values) as well as our dimensions-reduced data (from the PCA). Then, we build models to explain what drives world happiness levels. We use ridge regression as opposed to standard OLS due to some of the explanatory variables being highly correlated among each other. All predictors are scaled using their mean and standard deviation as is required by ridge.

ML_modelling.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values) containing all individual input variales. Then, we build a machine learning (ML) model that we can use to predict global happiness levels. The data is split for training, validation and testing and the best performing model (derived through a grid search) is exported to make predictions in a separate notebook.

Predicting_future_happiness.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values) containing all individual input variables. Then, we import the parameters of the best performing machine-learning model that we can use to predict future happiness. Before we generate the predictions, we apply a series of country-level regression models that project the historical values on to the next two years, thus providing the inputs we need for applying our machine learning model.

About

Contains data cleaning, visualization and modelling, incl. ML-based predictions of happiness levels in the next 2 years

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published