Skip to content

A Pytorch implementation of missing data imputation using optimal transport.

Notifications You must be signed in to change notification settings

BorisMuzellec/MissingDataOT

Repository files navigation

Missing Data Imputation using Optimal Transport

Overview

This repository complements the paper Missing Data Imputation using Optimal Transport (Muzellec B., Josse J., Boyer C., Cuturi, M.):

  • experiment.py allows to reproduce the imputation benchmark therein;
  • imputers.py contains the classes corresponding to algorithms 1 and 3;
  • data_loaders.py contains data loading utilities for the UCI ML repository datasets on which experiments are run;
  • utils.py contains methods of general utility, and the implementation of MAR and MNAR missing data mechanisms in particular;
  • softimpute.py contains the implementation of the softimpute baseline.

An example notebook is also available: UCI_demo.ipynb.

References

Muzellec B., Josse J., Boyer C., Cuturi, M.: Missing Data Imputation using Optimal Transport

@inproceedings{muzellec2020missing,
  title={Missing Data Imputation using Optimal Transport},
  author={Muzellec, Boris and Josse, Julie and Boyer, Claire and Cuturi, Marco},
  booktitle={International Conference on Machine Learning},
  pages={7130--7140},
  year={2020},
  organization={PMLR}
}

Dependencies

To use the data loading utilities in data_loaders.py, wget is also required.

About

A Pytorch implementation of missing data imputation using optimal transport.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published