Skip to content

Commit

Permalink
add mdi+ to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tiffanymtang committed Aug 16, 2023
1 parent 779ac00 commit 09e750f
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
<p align="center">
<img align="center" width=75% src="https://yu-group.github.io/imodels-experiments/logo_experiments.svg?sanitize=True"> </img> <br/>
Scripts for easily comparing different aspects of the <a href="https://github.com/csinva/imodels">imodels package.</a> Contains code to reproduce <a href="https://arxiv.org/abs/2201.11931">FIGS</a> + <a href="https://arxiv.org/abs/2202.00858">Hierarchical shrinkage</a> + <a href="https://arxiv.org/abs/2205.15135">G-FIGS</a>.
Scripts for easily comparing different aspects of the <a href="https://github.com/csinva/imodels">imodels package.</a> Contains code to reproduce <a href="https://arxiv.org/abs/2201.11931">FIGS</a> + <a href="https://arxiv.org/abs/2202.00858">Hierarchical shrinkage</a> + <a href="https://arxiv.org/abs/2205.15135">G-FIGS</a> + <a href="https://arxiv.org/pdf/2307.01932.pdf">MDI+</a>.
</p>

# Documentation

Follow these steps to benchmark a new (supervised) model. If you want to benchmark something like feature importance or unsupervised learning, you will have to make more substantial changes (mostly in `01_fit_models.py`)
Follow these steps to benchmark a new (supervised) model.

1. Write the sklearn-compliant model (init, fit, predict, predict_proba for classifiers) and add it somewhere in a local folder or in `imodels`
2. Update configs - create a new folder mimicking an existing folder (e.g. `config.interactions`)
Expand All @@ -21,6 +21,8 @@ Follow these steps to benchmark a new (supervised) model. If you want to benchma
5. put scripts/notebooks into a subdirectory of the `notebooks` folder (e.g. `notebooks/interactions`)


Note: If you want to benchmark feature importances, go to [feature_importance/](https://github.com/Yu-Group/imodels-experiments/tree/master/feature_importance). For benchmarking other tasks such as unsupervised learning, you will have to make more substantial changes (mostly in `01_fit_models.py`).

## Config
- When running multiple seeds, we want to aggregate over all keys that are not the split_seed
- If a hyperparameter is not passed in `ModelConfig` (e.g. because we are using parial), it cannot be aggregated over seeds later on
Expand Down Expand Up @@ -77,3 +79,10 @@ Machine learning in high-stakes domains, such as healthcare, faces two critical
<p align="center">
<i>G-FIGS 2-step process explained.</i>
</p>


### MDI+: A Flexible Random Forest-Based Feature Importance Framework

[📄 Paper](https://arxiv.org/pdf/2307.01932.pdf), [📌 Citation](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C23&q=MDI%2B%3A+A+Flexible+Random+Forest-Based+Feature+Importance+Framework&btnG=#d=gs_cit&t=1690399844081&u=%2Fscholar%3Fq%3Dinfo%3Axc0LcHXE_lUJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Den)

MDI+ is a novel feature importance framework, which generalizes the popular mean decrease in impurity (MDI) importance score for random forests. At its core, MDI+ expands upon a recently discovered connection between linear regression and decision trees. In doing so, MDI+ enables practitioners to (1) tailor the feature importance computation to the data/problem structure and (2) incorporate additional features or knowledge to mitigate known biases of decision trees. In both real data case studies and extensive real-data-inspired simulations, MDI+ outperforms commonly used feature importance measures (e.g., MDI, permutation-based scores, and TreeSHAP) by substantional margins.

0 comments on commit 09e750f

Please sign in to comment.