Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User warnings and aligned data with SHAP output for low numbers of splits #34

Open
jason-bentley opened this issue Aug 27, 2020 · 0 comments
Assignees
Labels
API New feature or request

Comments

@jason-bentley
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The number of n_splits used in the crossfit impacts the coverage of observations inspected for calculating SHAP values. With low coverage the number of rows in the consolidated SHAP matrix is less than the number of observations.

Describe the solution you'd like
The ideal solution has a few elements:

  • A warning should appear for a low number of splits along with a message indicating the coverage of observations for SHAP value calculation.
  • The inspector should produce all the inputs required for utilising existing shap plotting functions. The inspector should automatically create a sample that contains only the observations that have been explained, so it is aligned with the SHAP outputs.

Describe alternatives you've considered
None - the above solution is the minimum requirement.

Additional context
As an example using 500 simulated data points we can see that in the extreme case of using n_splits = 1, we find the SHAP analysis covers 40% of observations:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant