Identify Fraud from Enron Email and Financial datasets using Python

July-August 2017, by Jude Moon

Overview

In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. In the resulting Federal investigation, a significant amount of typically confidential information entered into the public record, including tens of thousands of emails and detailed financial data for top executives.

In this project, I played a detective, and put the new skills to use by building a person of interest (POI) identifier based on financial and email data made public as a result of the Enron scandal. I used the provided dataset from Udacity Intro to Machine Learning Course, which was combined with a hand-generated list of POI in the fraud case. POIs are individuals who were indicted, reached a settlement or plea deal with the government, or testified in exchange for prosecution immunity.

Files

final_project_dataset.pkl: main data file; data dictionary is stored as a pickle file
poi_names.txt: supplementary data file
enron61702insiderpay.pdf: supplementary information file
enron_project.ipynb: documentation of algorithm analysis and answers to a series of questions
poi_id.py: python script to create three pickle files (my_dataset.pkl, my_classifier.pkl, my_feature_list.pkl) for the finalized classifier
tester.py: python script to evaluate the three pickle files
feature_format.py: python module to convert data dictionary to numpy array for sklearn modules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identify Fraud from Enron Email and Financial datasets using Python

Overview

Files

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
enron61702insiderpay.pdf		enron61702insiderpay.pdf
enron_project.ipynb		enron_project.ipynb
feature_format.py		feature_format.py
final_project_dataset.pkl		final_project_dataset.pkl
my_classifier.pkl		my_classifier.pkl
my_dataset.pkl		my_dataset.pkl
my_feature_list.pkl		my_feature_list.pkl
poi_id.py		poi_id.py
poi_names.txt		poi_names.txt
tester.py		tester.py

judemoon/enron

Folders and files

Latest commit

History

Repository files navigation

Identify Fraud from Enron Email and Financial datasets using Python

Overview

Files

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages