Marco Anisetti, Claudio A. Ardagna, Nicola Bena, Ernesto Damiani, Paolo G. Panero
Preprint: arXiv
Modern applications are increasingly driven by Machine Learning (ML) models and their non-deterministic behavior is affecting the entire application life cycle from design to operation. The pervasive adoption of ML is urgently calling for approaches that guarantee a stable non-functional behavior of ML-based applications over time and across model changes. To this aim, non-functional properties of ML models, such as privacy, confidentiality, fairness, and explainability, must be monitored, verified, and maintained. This need is even more pressing when modern applications operate in the cloud-edge continuum, increasing their complexity and dynamicity. Existing approaches mostly focus on i) implementing classifier selection solutions according to the functional behavior of ML models, ii) finding new algorithmic solutions to this need, such as continuous re-training. In this paper, we propose a multi-model approach built on dynamic classifier selection, where multiple ML models showing similar non-functional properties are available to the application and one model is selected at time according to (dynamic and unpredictable) contextual changes. Our solution goes beyond the state of the art by providing an architectural and methodological approach that continuously guarantees a stable non-functional behavior of ML-based applications, is applicable to any ML models, and is driven by non-functional properties assessed on the models themselves. It consists of a two-step process working during application operation, where model assessment verifies non-functional properties of ML models trained and selected at development time, and model substitution guarantees a continuous and stable support of non-functional properties. We experimentally evaluate our solution in a real-world scenario focusing on non-functional property fairness.
This repository contains the source code, input dataset, intermediate results and detailed results of our experimental evaluation. Note: the dataset description refers to the uncompressed dataset, some directories are compressed for storage reasons.
-
- 2.1. Details: Code Organization
- 2.2. Details: Input
- 2.3. Details: Output
The code is written in Python 3 and tested on a laptop with Intel Core i7 processor at 2.60 GHz, 16.0 GB of RAM memory, OS Microsoft Windows 10.
The aim of the experimental evaluation is to show the validity of our multi-model MAB-based approach by applying it to a challenging scenario considering non-functional property fairness. We used the well-known dataset of the Connecticut State Department of Correction available at https://data.ct.gov/Public-Safety/Accused-Pre-Trial-Inmates-in-Correctional-Faciliti/b674-jy6w. It provides a daily updated list of people detained in the Department facilities awaiting a trial. This dataset anonymously discloses data of individual people detained in the correctional facilities every day starting from July 1st, 2016. It contains attributes such as last admission date, race, gender, age, type of offense and facility description, in more than 4 millions data points (at the download date). We divided this dataset into training and test sets. The training set includes more than 3 million points.
Note: all CSV files use ;
as separator.
This repository is organized in the following directories and files:
Data
: contains the CSV files to be fed to the ML models. Files are compressed for storage reasons.Models
: contains the models used by the MAB (saved in the joblib format).Csv_MAB-OutputFiles.zip
: contains the results of the experiments executions. For each execution with a different memory size, different files are created with different statistics, including raw data used for experiment evaluation, such as:- beta distributions (files starting with
beta_detail_noreset
) - estimate of each model wins (files starting with
est_models_detail_noreset
) - windows information about winner, services performances and so on (files starting with
shift_detail_noreset
) - Thompson sampling (files starting with
thommy_detail_noreset
) - values remaining in experiment (files starting with
values_remaining
). For storage reasons, this file is omitted from the repository.
- beta distributions (files starting with
le_modelv_v2.pkl
: contains a dictionary for categorical fields used by MAB experiment in pickle format. The same dictionary is provided in JSON format (le_modelv_v2.json
)MAB.py
,PostProcessData_Fig10.py
, andPostProcessData_Fig11-12.py
contain the experiments code.
The code consists of the following files:
MAB.py
: contains the main algorithm implementing the MAB experiment.PostProcessData_Fig10.py
: contains the postprocessing instructions used to implement fig. 10.PostProcessData_Fig11-12.py
: contains the postprocessing instructions used to implement fig. 11 and 12.
The inputs vary according to the script:. For the sake of simplicity, some settings are hard-coded directly into the scripts:
MAB.py
: uses as input the different datasets in directoryData
, the serialized models contained in directoryModels
and the dictionary for categorical fields inle_modelv_v2.pkl
. Memory percentage is set in the variablememory
and variance threshold is specified in the constantVARIANCE
.PostProcessData_Fig10.py
: uses as input some of the files produced byMAB.py
, in particularshift_detail_noreset
andvalues_remaining
(converted in CSV format). Files names are hard-coded in variablesmaster_file*
andinput_file*
(one series for each memory size used) and the values of penalties are set in theresidualError
variable.PostProcessData_Fig11-12.py
: uses as input some of the files produced byMAB.py
, in particularshift_detail_noreset
andvalues_remaining
(converted in CSV format). It also uses a temporary file fromPostProcessData_Fig10.py
since the baseline is the same in executions with no memory. Files names are hard-coded inmaster_file*
andinput_file*
variables (one series for each memory size used) and the values of penalties are set in theresidualError
variable.
The code produces aggregated and detailed results in a (custom) txt format which contains serialized data. Such data should usually be converted to CSV format for ease of reading and to be fed to postprocessing scripts, as we did here.
During execution, the code writes output in the execution directory. With the hope that they may be useful, we include all our data: our results as well as intermediate data, converted in CSV.
The outputs vary according to the script:
MAB.py
: as mentioned, for each execution with a different memory size, different files are created with different statistics and raw data used for experiment evaluation such as beta distributions (files starting withbeta_detail_noreset
), estimate of each model's wins (files starting withest_models_detail_noreset
), windows information about winner, services performances and so on (files starting withshift_detail_noreset
), Thompson sampling (files starting withthommy_detail_noreset
), values remaining in experiment (files starting withvalues_remaining
).PostProcessData_Fig10.py
: produces as well as temporary files, output result in files starting withFig10_data
(one for each memory size). The output files provided here are already converted in CSV format.PostProcessData_Fig11-12.py
: produces as well as temporary files, output result in files starting withFig11-12_data
(one for each memory size). Output files are directly in CSV format.
Individual files have self-explanatory names. Those containing expAll
contains data evaluating all the experiments together, while the digit 0, 5, 10, 25
indicates the memory used.
Note: MAB implies a random execution due to its exploratory behavior. This means that each execution may not produce identical results.
To reproduce our results, you need to:
- install the necessary dependencies in
requirements.txt
- modify the scripts according to the path where you placed data files
- run the scripts; scripts do not require parameters.
Experiments on MAB.py
can take hours, while postprocessing scripts usually take minutes.
It is mandatory to cite our publication, according to our license.
The paper can be cited as follows:
@article{AABDP.TSC2024,
title={Continuous Management of Machine Learning-Based Application Behavior},
author={Anisetti, Marco and Ardagna, Claudio A and Bena, Nicola and Damiani, Ernesto and Panero, P. G.},
year={2024},
journal={IEEE Transactions on Services Computing}
}
The work was partially supported by:
- project BA-PHERD, funded by the European Union – NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.1: "Fondo Bando PRIN 2022" (CUP G53D23002910006)
- MUSA - Multilayered Urban Sustainability Action - project, funded by the European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.5: Strengthening of research structures and creation of R&D "innovation ecosystems", set up of "territorial leaders in R&D" (CUP G43C22001370007, Code ECS00000037)
- 1H-HUB and SOV-EDGE-HUB funded by Università degli Studi di Milano – PSR 2021/2022 – GSA – Linea 6
- program "Piano di Sostegno alla Ricerca" funded by Università degli Studi di Milano
This work is licensed under a Creative Commons Attribution 4.0 International License.