In case the github fails to render the ipynb file, please click here to view the project on NBVIEWER
Machine learning project to detect municipality level corruption based on its audit report. This project is motivated by the article 'The Political Resource Curse', which manually labeled selected munincipalities as corrupt or non-corrupt based on the audit report. This project aims to implement prediction model to label the rest of the unlabeld manincipalities based on the audit report. The first first part of the project is cleaning and preprocessing of text data in Portuguese using libraries such as NTLK. The second part is focused on the basic analysis of the text data. The third part implements the machine learning algorithm to detect corruptions.
Text document can be accessed at here