World happiness insights

This project consists of data collection & cleaning, visualization and modelling, including generating based predictions of happiness levels in the next 2 years using a machine learning (ML) model. The project was done using R code and RMarkdown notebooks, with short descriptions of each individual Rmd file provided below.

Custom_functions.R

Contains a serious of R functions developed specifically for use in this project. These functions facilitate data prep, dimensions reduction, ridge regression analysis as well as building and using an ML-based model to generate predictions for future happiness.

Data_prep.Rmd

The purpose of this file is to import, clean up and prepare data from multiple source for the analysis of world happiness data. Running this file is a prerequisite for being able to make some proper data visualizations and for doing statistics. Some of the data may still need to be downloaded from the respective sources (please see the Data/Raw/ folder).

Dealing_with_missing_data.Rmd

In this file, we deal with missing data by performing an imputation based on a random forest model. For each missing value, 10 possible values are generated, of which the average is kept and used for further analysis. Some columns containing too many missing values may be dropped entirely and therefore not used for analytical purposes.

Happinness_around_the_world.Rmd

In this file, we take a look at the current state of happiness as well as what happiness looked like in the past and how much development has taken place since the base year. We look at both world happiness as well as at happiness by country and region. Finally, we explore whether happiness is rather stable or rather dynamic.

Happinness_global_correlations.Rmd

In this file, we explore the correlations between happiness and different economic, political, societal, environmental and health-related factors. We do this both for the most recent year in the data and for all available years in the data. Finally, we explore explore how the correlations have evolved throughout time.

Dimensions_reduction.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values). After that, we try to construct a simplified set of variables that reflects the overall categories, i.e. economic, political, social, environmental and health-related factors. We use principal component analysis (PCA) as our preferred dimension reduction technique.

Ridge_regression_analysis.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values) as well as our dimensions-reduced data (from the PCA). Then, we build models to explain what drives world happiness levels. We use ridge regression as opposed to standard OLS due to some of the explanatory variables being highly correlated among each other. All predictors are scaled using their mean and standard deviation as is required by ridge.

ML_modelling.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values) containing all individual input variales. Then, we build a machine learning (ML) model that we can use to predict global happiness levels. The data is split for training, validation and testing and the best performing model (derived through a grid search) is exported to make predictions in a separate notebook.

Predicting_future_happiness.Rmd

In this file, we import the already clean data (where we've also made imputations for missing values) containing all individual input variables. Then, we import the parameters of the best performing machine-learning model that we can use to predict future happiness. Before we generate the predictions, we apply a series of country-level regression models that project the historical values on to the next two years, thus providing the inputs we need for applying our machine learning model.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data		Data
Custom_functions.R		Custom_functions.R
Data_prep.Rmd		Data_prep.Rmd
Dealing_with_missing_data.Rmd		Dealing_with_missing_data.Rmd
Dealing_with_missing_data.html		Dealing_with_missing_data.html
Dimensions_reduction.Rmd		Dimensions_reduction.Rmd
Dimensions_reduction.html		Dimensions_reduction.html
Happinness_around_the_world.Rmd		Happinness_around_the_world.Rmd
Happinness_around_the_world.html		Happinness_around_the_world.html
Happinness_global_correlations.Rmd		Happinness_global_correlations.Rmd
Happinness_global_correlations.html		Happinness_global_correlations.html
Insights presentation.pdf		Insights presentation.pdf
LICENSE		LICENSE
ML_modelling.Rmd		ML_modelling.Rmd
ML_modelling.html		ML_modelling.html
Predicting_future_happiness.Rmd		Predicting_future_happiness.Rmd
Predicting_future_happiness.html		Predicting_future_happiness.html
README.md		README.md
Ridge_regression_analysis.Rmd		Ridge_regression_analysis.Rmd
WIP_notes.Rmd		WIP_notes.Rmd
Years to use.txt		Years to use.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World happiness insights

Custom_functions.R

Data_prep.Rmd

Dealing_with_missing_data.Rmd

Happinness_around_the_world.Rmd

Happinness_global_correlations.Rmd

Dimensions_reduction.Rmd

Ridge_regression_analysis.Rmd

ML_modelling.Rmd

Predicting_future_happiness.Rmd

About

Releases

Packages

Languages

License

cyrilby/world-happiness

Folders and files

Latest commit

History

Repository files navigation

World happiness insights

Custom_functions.R

Data_prep.Rmd

Dealing_with_missing_data.Rmd

Happinness_around_the_world.Rmd

Happinness_global_correlations.Rmd

Dimensions_reduction.Rmd

Ridge_regression_analysis.Rmd

ML_modelling.Rmd

Predicting_future_happiness.Rmd

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages