This is the artifact to accompany the ASE technical track submission "Distribution Models for Falsification and Verification of DNNs".
The artifact is split into several directories.
-
Directory
rq1
contains the code and data from our experiments for research question 1. The environment models used to evaluate DFV, verifiers and falsifiers outputs, and processed data can be downloaded by runningpython download.py
from within therq1
directory. -
Directory
rq2
contains the code and data from our experiments for research question 2. The environment models used to evaluate DFV, falsifiers outputs, and processed data can be downloaded by runningpython download.py
from within therq2
directory. -
Directory
rq3
contains the code and data from our experiments for research question 3. The environment models used to evaluate DFV are available in theRQ3 Models
directory at https://drive.google.com/drive/folders/1E1-keCoZ8bvKFkud_jofXtYApKT-nTJz. The models and data trained to use the models can be downloaded by runningpython download.py
from within therq3
directory.
The instructions to reproduce our experiments are in the INSTALL.md file under the 'Reproducing the study' section.
This directory contains the code to replicate the evaluation of our first research question.
Use train_vae.py
to train the Fashion MNIST network and VAE. Use train_vae_mrs.py
to train the VAE MRS. Models will be output to the directory ./saved_models/
.
Use verify.sh
to verify all the properties on the Fashion MNIST network with and without DFV. By default the script will run all the tools used in the experiment. However a specific tool can be specified by passing neurify
, nnenum
, verinet
as the first argument. Logs and counter-examples will be output to the directory ./output/Verifiers/
.
Use falsify.sh
to falsify all the properties on the Fashion MNIST network with and without DFV. By default the script will run all the tools used in the experiment. However a specific tool can be specified by passing deepfool
, bim
, fgsm
, pgd
as the first argument. Logs and counter-examples will be output to the directory ./output/Falsifiers/
.
The data generated by verify.sh
in ./output/Verifiers
and falsify.sh
in ./output/Falsifiers
need to be processed. To do so, run process_data.py
. The processed data will be output to the directory ./processed_data/
.
The images used for the paper can be generated by running print_images.py
. Images will be output to the directory ./images/
.
Running all the above scripts can be time-consuming, therefore by running download.py
three folders will be downloaded processed_data
, saved_models
and output
. These folders contain all the data needed to execute print_images.py
without running the other scripts.
./common
contains files needed to run theverify.sh
,falsify.sh
andprocess_data.py
../images
contains the png images generated for the paper../properties
contains the properties used for the verification and falsification of our model with and without DFV.
./data
will be created when runningtrain_vae.py
ortrain_vae_mrs.py
to store MNIST Fashion dataset../saved_models
will be created when runningtrain_vae.py
ortrain_vae_mrs.py
to store the state dictionaries of the models../output
will be created when runningverify.sh
orfalsify.sh
to store the results../processed_data
will be created when runningprocess_data.py
to store the calculations made on the obtained data.
This directory contains the code to replicate the evaluation of our second research question.
Use train_models.py
to train the 90 Fashion MNIST VAEs. Use train_vae_mrs.py
to train the VAE MRS. Models will be output to the directory ./saved_models/
.
Use falsify_multidim.sh
to falsify all the properties on the 90 Fashion MNIST models with DFV. Use falsify_ls.sh
to falsify all the properties on the Fashion MNIST DFV 8-2-256 model, varying the radius of the latent space. Logs and counter-examples from falsify_multidim.sh
will be output to the directory ./output/multidim_study/
, and the results from falsify_ls.sh
will be output to the directory ./output/ls_study/
.
The data generated by falsify_multidim.sh
and falsify_ls.sh
need to be processed. To do so, run process_data.py
. The processed data will be output to the directory ./processed_data/
.
The images used for the paper can be generated by running print_images.py
. Images will be output to the directory ./images/
.
Running all the above scripts can be time-consuming, therefore by running download.py
three folders will be downloaded processed_data
, saved_models
and output
. These folders contain all the data needed to execute print_images.py
without running the other scripts.
./common
contains files needed to run thefalsify_multidim.sh
,falsify_ls.sh
andprocess_data.py
../images
contains the png images generated for the paper../properties/multidim_study
contains the properties used for the falsification of the 90 DFV models../properties/ls_study
contains the properties used for variating the latent space radius of DFV 8-2-256 model.
./data
will be created when runningtrain_models.py
ortrain_vae_mrs.py
to store MNIST Fashion dataset../saved_models
will be created when runningtrain_models.py
ortrain_vae_mrs.py
to store the state dictionaries of the models../output
will be created when runningfalsify_multidim.sh
orfalsify_ls.sh
to store the results../processed_data
will be created when runningprocess_data.py
to store the calculations made on the obtained data.
This directory contains the code to replicate the evaluation of our third research question.
To train environment models, first download the pre-processed DroNet dataset by running python download_data.py
.
Environment models can be trained by running ./train_fc.py
to train FC-VAE_{DroNet}, ./train_vae.py
to train Conv-VAE_{DroNet}, and ./train_dcgan.py
to train GAN_{DroNet}. Models will be output to the directory ./models/
.
The PGD falsifier can then be run on the DroNet network, both without DFV and with DFV using the VAE and GAN models by running ./falsify.sh
. By default this script will run all 3 methods. A specific treatment can be specified by passing dnn
, vae
, or gan
as the first argument to this script. If the first argument is all
then all treatments will be run. This script will save all counter-examples to the directory cex/
. The second argument to ./falsify.sh
can be used to specify a different name for this directory. The third argument accepted by this script specifies a timeout for each job in seconds. By default this timeout is 1 hour, or 3600 seconds. The fourth argument specifies where to save the logs, by default this is the directory logs/falsification_logs
.
After running the falsifier, counter-examples can be converted to png formatted images and the plots for the paper can be generated by running python npy_to_png.py cex logs/falsification_logs
, where cex
is the directory containing the counter-examples and logs/falsification_logs
is the directory containing the logs from running falsify.sh
. This script will convert all counter-examples from the numpy .npy
format to PNG images, will create CSV files containing the times and MRS values for each counter-example, and generate the plots shown in the paper.
Running all of the above scripts can be time-consuming. We provide a script download.py
to download our data to allow you to start from any stage of this process. This script will download our trained environment models and logs from running the falsifiers.
./benchmark
contains the GHPR-DroNet benchmark. Within this directory,onnx/
contains the DroNet model,properties/
contains the standard GHPR-DroNet properties, andvae_properties/
contains the properties modified to accept an environment model prefix../cex
contains the counter-examples found, with the sub-directorynpy/
containing counter-examples in.npy
format, andpng/
containing the counter-examples in.png
format.
./models
will be created when running any of the training scripts (or downloading the models)../logs/falsification_logs
will be created when runningfalsify.sh
../Data
will be created when runningpython download_data.py
, and contains the pre-processed DroNet dataset.