Privacy guarantees are the most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be shared nor moved from their silos of resident, let alone analysed in their raw form. As a result, data anonymisation techniques are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the memoisation effect of Deep learning models could be maliciously exploited to attack the models, and reconstruct sensitive information about samples used in training, even if these information were not originally provided.
Privacy-preserving machine learning (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.
This workshop will be mainly organised in three main parts. In the first part, we will introduce the main concepts of differential privacy: what is it, and how this method differs from more classical anonymisation techniques (e.g. k-anonymity
). In the second part, we will focus on Machine learning experiments. We will start by demonstrating how DL models could be exploited (i.e. inference attack ) to reconstruct original data solely analysing models predictions; and then we will explore how differential privacy can help us protecting the privacy of our model, with minimum disruption to the original pipeline. Finally, we will conclude the tutorial considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised distributed federated learning strategies.