Membership Inference Attack (MIA) has recently become a popular privacy attack in the machine learning field. This repository implements the VMIAP to attack 20 (40) target models trained with different splits of the same dataset with 77 MIAs. Then, the VMIAP will analyze the vulnerability of each data point under those target models and MIAs. We measure the vulnerability of each data point mainly with two metrics: the first one is Average Member Exposure Rate (AMER) and the second one is Average Non-Member Exposure Rate (ANMER). The AMER is the average of rates, each of which is the percentage of MIAs correctly predicting that the target data point is in the training dataset of one target model. If the data point is in the test data, we name it ANMER. For more details, please refer to our paper.
.
├── dataset # Make sure the existence of datasets before running
│ ├── mnist-in-csv # https://www.kaggle.com/datasets/oddrationale/mnist-in-csv
│ │ ├── mnist_test.csv # mnist with CVS format
│ │ └── mnist_train.csv #
│ ├── purchase-100-in-csv # https://www.kaggle.com/datasets/datamaters/purchase1000
│ │ └── purchase100.npz # purchase100 with npz format
│ ├── cifar-10-train-in-csv # https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv
│ │ └── train.csv # cifar-10 with CSV format
├── models # The library of models
├── privacy_risk_score # from https://github.com/inspire-group/membership-inference-evaluation
├── setting # each yaml file corrsponding to a setting of run
├── shapley_value # measure the shapley value of data points
├── analyze_result.py # analyze the results of multiple MIAs on multiple target models
├── attack_feature_and_metric.py # extract attack features from the model and datasets
├── classifier_based_MIA.py # the classifier-based MIA
├── main.py
├── model.py # model definitions for mnist, purchase100, cifar-10
├── non_classifier_based_MIA.py # non-classifier-based MIA
├── obtain_dataset.py # extract dataset from the storage
├── README.md
├── train_process.py # train the model (target, shadow, attack)
└── utils.py # some utility functions
-
Python 3.8.17
-
Libraries
conda install --yes --file requirements.txt
-
Default command of running
python3 main.py './XXXX/XXXX/config.yaml'(path of a configuration yaml file)
With this repository, you can run experiments under the following settings.
- default_exploration_cifar (default setting)
- default_exploration_mnist
- default_exploration_purchase
- default_hyper_for_target
- more_data_for_training (more data for training)
- repeat_experiments (retrain target, shadow, and attack models with different random seeds to initialize the parameters and shuffle the training data)
- shadow_from_other_dis_cifar (data used for shadow models are from different distributions)
- shadow_from_other_dis_mnist
- shadow_from_other_dis_purchase
Attack 20 target models (CIFAR-10 and LetNet) with 77 MIAs and analyze the vulnerability of each data point.
python3 main.py './setting/default_exploration_cifar/LetNet_20_0.5_0.5_0.5.yaml'
Analysis results are saved in the last split (split-19 for 20 target models) and might contain the following files.
- infer_result_XXX.txt (the membership prediction results of 77 MIAs on 20 or 40 target models)
- member_non_member_time.txt (the member and non-member times of each data point)
- per_result_XXX.txt (the performance of target models and MIAs)
- sv_val.txt (the shapley values of data points)
- prs_val.txt (the privacy risk scores of data points)
- vul_as_train_avg.txt (vulnerable data points as AMER)
- vul_as_test_avg.txt (vulnerable data points as ANMER)
Reanalyzing the results of 20 target models with 77 MIAs need to locate infer_result_XXX.txt and per_result_XXX.txt firstly and then run the example shown in analyze_result.py.
If you have any questions, please feel free to post an issue or contact me via email ([email protected]).