Self-Explaining Neural Networks: A review with extensions

This repository contains the code for reproducing the paper ``Towards Robust Interpretability with Self-Explaining Neural Networks''[1] and extending it. The authors propose a framework called SENN (Self-Explaining Neural Network) which is transparent by design. We study the reproducibilty and validity of the proposed framework. Several weaknesses of the approach are identified. Most notably, we find that the model rarely generates good explanations, and that performance is compromised more than reported by the authors when enforcing explanations to be stable. We put forward improvements to the framework that address these weaknesses in a principled way, and show them to enhance the interpretability of generated explanations.

Project Structure

How to run?

Clone and activate our conda environment

 git clone https://github.com/AmanDaVinci/SENN
 cd senn
 conda env create -f "environment.yml
 conda activate senn

To reproduce our results using trained models, run the Report Notebook.
To train a model using one of our experiment parameters:

  python main.py --config "configs/compas_lambda1e-4_seed555.json"

To train a new model or perform a new experiment:

 python main.py --config "./config.json"

Where config.json is prepared according to the template below:

{
  "exp_name": "exp001",                         (str, the name of the experiment, used to save the checkpoints and csv results)
  "data_path": "datasets/data/mnist_data",      (str, the path where the data is to be saved)
  "model_class": "SENN"/"DiSENN",               (str, whether to create a SENN or a DiSENN model)
  "pretrain_epochs": 1,                         (int, the number of epochs  to pretrain a beta-VAE for)
  "pre_beta": 1.0,                              (float, the beta to be used in case of DiSENN (VAE pretraining))
  "beta": 4.0,                                  (float, the beta to be used in case of DiSENN (DiSENN training))
  "train": true/false,                          (bool, whether to train the model or not)
  "dataloader": "compas"/"mnist",               (str, the name of the dataloader to be used)
  "conceptizer": "Conceptizer",                 (str, the name of the conceptizer class to be used)
  "parameterizer": "Parameterizer",             (str, the name of the parameterizer class to be used)
  "aggregator": "Aggregator",                   (str, the name of the aggregator class to be used)
  "image_size": 28,                             (int, the size of the input images)
  "num_concepts": 5,                            (int, the number of concepts to be used in training)
  "num_classes": 10,                            (int, the number of output classes)
  "dropout": 0.5,                               (float, the dropout value to be used during training)
  "device": "cuda:0"/"cpu",                     (str, which device to be used for the model)
  "lr": 2e-4,                                   (float, the learning rate)
  "epochs": 100,                                (int, the number of epochs)
  "batch_size" : 200,                           (int, the size of each batch of data)
  "print_freq": 100,                            (int, how often to print metrics for the trainint set)
  "eval_freq" : 30,                             (int, how often to evaluate the model and print metrics for the validation set)
  "robustness_loss": "compas_robustness_loss",  (str, the name of the robustness loss function from the losses package)
  "robust_reg": 1e-1,                           (float, the robustness regularization hyperparameter)
  "concept_reg": 1,                             (float, the concept regularization hyperparameter)
  "sparsity_reg": 2e-5,                         (float, the sparsity regularization hyperparameter)
  "manual_seed": 42                             (int, the seed to be used for reproducibility)
  "accuracy_vs_lambda": ['c1.json','c2.json']   (list of str or list of lists where the inner lists need to have the same lengths, containing the name of the config files for the accuracy vs lambda plots)
  "num_seeds": 1                                (int, number of seeds used for the accuracy_vs_lambda plot, needs to be equal to the lengths of the inner lists passed in accuracy_vs_lambda, default = 1)
}

Note: It is also possible to specify the architectures of the parameterizer and conceptizer classes using config parameters. However, to keep it neat, these are not shown here. For more information, please refer to the docstrings of the specific classes and the parameters they can take.

Results

The Report Notebook reproduces all the results of our experiments. Here we present the major results:

Reproduced MNIST Test Accuracy: 98.9%
Reproduced COMPAS Test Accuracy: 80.9%
SENN Explanations:
SENN Prototypes:

Documentation

The documentation of our SENN package is available on https://senn.readthedocs.io/en/latest/

Authors

Aman Hussain ([email protected]) ID: 12667447
Omar Elbaghdadi ([email protected]) ID: 12660256
Christoph Hoenes ([email protected]) ID: 12861944
Ivan Bardarov ([email protected]) ID: 12579572

Supervisor:
Simon Passenheim

References

[1] David Alvarez Melis, Tommi S. Jaakkola
"Towards Robust Interpretability with Self-Explaining Neural Networks" NIPS 2018
[2] Irina Higgins, et al.
”β-VAE: Learning basic visual concepts with a constrained variational framework.” ICLR 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 360 Commits
configs		configs
docs		docs
images		images
presentation		presentation
results		results
senn		senn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py
report.ipynb		report.ipynb
report.pdf		report.pdf
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Explaining Neural Networks: A review with extensions

Table of Contents

Project Structure

How to run?

Results

Documentation

Authors

References

About

Releases

Packages

Contributors 4

Languages

License

AmanDaVinci/SENN

Folders and files

Latest commit

History

Repository files navigation

Self-Explaining Neural Networks: A review with extensions

Table of Contents

Project Structure

How to run?

Results

Documentation

Authors

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages