GitHub - KAR-NG/Human-Resource-Data-Mining: 5 analytical tasks have been completed using VAT validated gower-PAM clustering, Correspondence Analysis (CA), Asym-Biplot, Multiple Correspondence Analysis (MCA), Chi-Squared test, Regression, and predictive classification models with KNN, SVM, and Random Forest.

Summary

This project applies a series of data mining techniques including clustering, principal component methods, regression, and classification algorithms to study inner trends hiden the dataset. Numerous visualisation were also applied to aid each of these data mining methods. In this project, 5 analytical tasks have been completed using VAT validated gower-PAM clustering, correspondence analysis (CA), asymmetric-biplot, multiple correspondence analysis (MCA), Chi-Squared test, Regression, and predictive classification models with KNN, SVM, and Random Forest.

Outputs show that there is no statistical evidence (p-value = 0.249) to support the argument that several managers among all are good at training their employees, or vice versa. Instead, most managers are good at training their subordinates reaching the “fully meet” standard. The company do actively hire employees from diverse backgrounds. The company has a good level of overall diversity level at 76%. 40% of the employees in the company are female and 40% are employees from diverse backgrounds. The company recruits employees from 8 sources and diversity job fair is the best choice if the company is keen to hire an employee from a diverse background, and employee-referral being the worst source at hiring an employee with diverse background (Chi-squared test for independence: x-squared = 21.989, df = 5, p-value = 0.0005).

Inferential regression was applied to study the relationships between salary and numerous factors (variables) that would potentially relates to unequal pay, such as age, years of working, race, gender and etc, and the result shows that the company is paying employees equally, supported by extensive visualisation and P-values of higher than 0.05. Finally, this dataset provides sufficient data to train a model with great predictive power. K-Nearest Neighbor (KNN), Polynomial-kernel Support Vector Machine (SVM), and Random Forest were selected as the modeling candidates. Output shows that Random Forest models with 0.405 probability cut-off point is the best algorithm to make prediction for who is leaving the company. It has a reliable overall accuracy rate at 95.7%, sensitivity rate of 93.5% (the metric that we are most interested in) and specificity rate of 96.7%.

Highlight

References

Clustering and dimensionality reduction techniques on the Berlin Airbnb data and the problem of mixed data (n.d.),viewed 15 May 2022 https://rstudio-pubs-static.s3.amazonaws.com/579984_6b9efbf84ee24f00985c29e24265d2ba.html

Forest picture in section 8.5.2, credit: Michael Thirnbeck 2010, https://www.flickr.com/photos/thirnbeck/4547405603

KASSAMBARA A 2017, Practical Guide To Principal Component Methods in R, Edition 1, sthda.com

Lovelytics 2020, HR Diversity Scorecard, viewed 15 May 2022, https://www.youtube.com/watch?v=oaLp5eBi6E8

Nancy Chelaru 2019, Factor analysis of mixed data, viewed 4 June 2022, https://rpubs.com/nchelaru/famd

Rich Huebner 2020, Human Resources Data Set, viewed 2 May 2022, https://www.kaggle.com/datasets/rhuebner/human-resources-data-set?resource=download

Rich Huebner 2021, Codebook - HR Dataset v14, viewed 3 May 2022, https://rpubs.com/rhuebner/hrd_cb_v14

Wicked Good Data - r 2016, https://www.r-bloggers.com/2016/06/clustering-mixed-data-types-in-r/, viewed 8 May 2022

Will Tracz 2021, HR Tech Is the Key: Here’s How to Get It Right, viewed 14 May 2022, https://hrdailyadvisor.blr.com/2021/08/03/hr-tech-is-the-key-heres-how-to-get-it-right/

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
hr_files/figure-gfm		hr_files/figure-gfm
.gitignore		.gitignore
HRtechnology-1.jpg		HRtechnology-1.jpg
README.md		README.md
Rdocument.R		Rdocument.R
hr.Rmd		hr.Rmd
hr.Rproj		hr.Rproj
hr.md		hr.md
hr2_pam.csv		hr2_pam.csv
hr_dataset.csv		hr_dataset.csv
hrdf.csv		hrdf.csv
pic1_forest.jpg		pic1_forest.jpg
pic2_highlights.png		pic2_highlights.png
pic3_thumbnail.png		pic3_thumbnail.png
plot8.5.pdf		plot8.5.pdf
plot8.5.png		plot8.5.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

KAR-NG/Human-Resource-Data-Mining

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages