This project focuses on detecting fraudulent credit card applications using Self-Organizing Maps (SOMs) and combining them with Artificial Neural Networks (ANNs) for a hybrid model.
- The intuition behind Self-Organizing Maps (SOMs) and their use in unsupervised learning.
- How to transition from unsupervised learning to supervised learning using deep learning techniques.
- How to visualize fraud detection results and evaluate predictions.
SOMs are unsupervised neural networks that project high-dimensional data into a lower-dimensional space, preserving the structure and identifying clusters.
-
Feature Scaling:
- Applies Min-Max Scaling to normalize the dataset between 0 and 1.
-
Training the SOM:
- A 10x10 SOM is trained to identify clusters in the data.
- SOM identifies outliers (potential frauds) by visualizing the distance map.
-
Visualization:
- A heatmap is plotted to show SOM clusters and highlight fraud-prone areas.
-
Fraud Detection:
- The SOM's winning nodes are used to extract the fraud entries.
Combining the strengths of SOMs and ANNs, the hybrid model uses SOM to label potential frauds and ANN to predict fraud probabilities.
- Fraud labels are assigned based on SOM outputs.
- The features (
customers
) are scaled using Standard Scaling.
- The ANN has:
- Input Layer: Accepts 15 features.
- Hidden Layer: Uses ReLU activation for non-linearity.
- Output Layer: Uses Sigmoid activation for predicting probabilities.
- Optimized using the Adam optimizer and binary cross-entropy loss function.
- Trained on fraud-labeled data for 2 epochs.
- Outputs probabilities between 0 and 1, suitable for binary classification tasks.
- Red areas represent clusters with higher potential fraud risk.
- Fraud probabilities for each customer are displayed in descending order for better evaluation.
- Install Python 3.x and required libraries:
pip install numpy pandas matplotlib keras minisom
- Place the dataset (
Credit_Card_Applications.csv
) in the project directory.
- Run
code2.py
to train the SOM and visualize fraud clusters. - Run
mega_case_study.py
to train the ANN and predict fraud probabilities.
-
Fraud List:
- Extracted from SOM mappings.
-
SOM Heatmap:
- Clustering visualization.
-
Fraud Probabilities:
- ANN outputs for each customer.
- SOM implementation uses the
minisom
library. - ANN framework is built using
Keras
withTensorFlow
backend.
Feel free to enhance the scripts and contribute to the project!