Skip to content
/ vimp Public
forked from bdwilliamson/vimp

Nonparametric variable importance assessment

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

guhjy/vimp

 
 

Repository files navigation

vimp: nonparametric variable importance assessment

CRAN_Status_Badge Travis-CI Build Status AppVeyor Build Status Coverage status License: MIT

Author: Brian Williamson

Introduction

In predictive modeling applications, it is often of interest to determine the relative contribution of subsets of features in explaining an outcome; this is often called variable importance. It is useful to consider variable importance as a function of the unknown, underlying data-generating mechanism rather than the specific predictive algorithm used to fit the data. This package provides functions that, given fitted values from predictive algorithms, compute nonparametric estimates of and variance-based variable importance, along with asymptotically valid confidence intervals for the true importance.

More detail may be found in our tech report.

This method works on low-dimensional and high-dimensional data.

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

R installation

You may install a stable release of vimp from CRAN via install.packages("vimp"). You may also install a stable release of vimp from GitHub via devtools by running the following code (you may replace v1.1.3 with the tag for the specific release you wish to install):

## install.packages("devtools") # only run this line if necessary
devtools::install_github(repo = "bdwilliamson/[email protected]")

You may install a development release of vimp from GitHub via devtools by running the following code:

## install.packages("devtools") # only run this line if necessary
devtools::install_github(repo = "bdwilliamson/vimp")

Example

This example shows how to use vimp in a simple setting with simulated data, using SuperLearner to estimate the conditional mean functions. For more examples and detailed explanation, please see the vignette.

## load required functions and libraries
library("SuperLearner")
library("vimp")
library("xgboost")
library("glmnet")

## -------------------------------------------------------------
## problem setup
## -------------------------------------------------------------
## set up the data
n <- 100
p <- 2
s <- 1 # desire importance for X_1
x <- as.data.frame(replicate(p, runif(n, -1, 1)))
y <- (x[,1])^2*(x[,1]+7/5) + (25/9)*(x[,2])^2 + rnorm(n, 0, 1) 

## -------------------------------------------------------------
## preliminary step: estimate the conditional means
## -------------------------------------------------------------
## set up the learner library, consisting of the mean, boosted trees,
## elastic net, and random forest
learner.lib <- c("SL.mean", "SL.xgboost", "SL.glmnet", "SL.randomForest")

## the full conditional mean
full_regression <- SuperLearner::SuperLearner(Y = y, X = x, family = gaussian(), SL.library = learner.lib)
full_fit <- full_regression$SL.predict

## the reduced conditional mean
reduced_regression <- SuperLearner::SuperLearner(Y = full_fit, X = x[, -s, drop = FALSE], family = gaussian(), SL.library = learner.lib)
reduced_fit <- reduced_regression$SL.predict

## -------------------------------------------------------------
## get variable importance!
## -------------------------------------------------------------
## get the variable importance estimate, SE, and CI
vimp <- vimp_regression(Y = y, f1 = full_fit, f2 = reduced_fit, indx = 1, run_regression = FALSE)

About

Nonparametric variable importance assessment

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 100.0%