- Add clustered bootstrap and associated unit tests
- Update software author list
- Fix roxygen2 CRAN bug for package documentation
- Fixed bugs introduced in 2.3.1 for
final_point_estimate = "average"
- In cases where sample-splitting is used (which is required for valid inference under the null hypothesis of zero variable importance), there is now the option to report a point estimate that is based on the entire dataset, rather than only the split on which inference (confidence intervals and p-values) is performed. The point estimator (using either the single split, the full dataset, or the average of the two split-specific point estimates) is valid regardless of whether the null holds or not. If this option is chosen, there may be a discrepancy between the point estimate and the interval estimate; this is likely to occur only in small-sample (or small effective sample-size, for binary outcomes) settings.
- For predictiveness measures that lie in [0, 1] by definition (accuracy, ANOVA, R-squared, deviance, AUC), the default is now to compute confidence intervals on the logit scale, which guarantees that the interval will also lie in [0, 1]. Note that this means the interval will not be centered at the point estimate; however, it retains the desired level of coverage.
- Predictiveness measures now have their own
S3
class, which makes internal code cleaner and facilitates simpler addition of new predictiveness measures. - In this version, the default return value of
extract_sampled_split_predictions
is a vector, not a list. This facilitates proper use in the new version of the package.
- You can now specify
truncate = FALSE
invimp_ci
- You can now compute variable importance using the average value under the optimal treatment rule. This includes functions
measure_avg_value
(computes the average value and efficient influence function) and updates tovim
,cv_vim
, andsp_vim
.
- None
- None
- Specify
method
andfamily
for weighted EIF estimation within outer functions (vim
,cv_vim
,sp_vim
) rather than themeasure*
functions. This allows compatibility for binary outcomes. - Added a vignette for coarsened-data settings.
- None
- Allow for unequal numbers of cross-fitting folds between full and reduced predictiveness
- None
- Return objects in
sp_vim
that are necessary to compute the test statistics
- None
- Allow
parallel
argument to be specified for calls toCV.SuperLearner
but not for calls toSuperLearner
- None
- Allow different types of bootstrap interval (e.g., percentile) to be computed
- More precise documentation for
Z
in coarsened-data settings; allow case-insensitive specification of covariate names/positions when creatingZ
V
defaults to 5 if no cross-fitting folds are specified externally- More precise documentation for
cross_fitted_f1
andcross_fitted_f2
incv_vim
- Allow non-list
cross_fitted_f1
andcross_fitted_f2
incv_vim
- None
- Update how
cv_vim
handles an odd number of outer folds being passed with pre-computed regression function estimates. Now, you can use an odd number of folds (e.g., 5) to estimate the full and reduced regression functions and still obtain cross-validated variable importance estimates.
- None
- Allow for odd number of folds in cross-fit and sampled-split VIM estimation
- Add
vrc01
data as an exported object - Change dataset for vignettes to
vrc01
data
- Updated computation of standard errors. Some of the changes in v2.2.0 (namely, that the efficient influence function can be estimated on the entire dataset regardless of whether or not sample-splitting was requested) do not match with the form of the standard error estimator that we use. In this update, we ensure that independent data are used to estimate both the predictiveness and the efficient influence function; however, the nuisance functions may still be estimated on a larger portion of the data than in versions prior to v2.2.0 when cross-fitting is used.
- Added explicit-value tests for point estimates throughout testthat/
- Harmonized vignettes with new SE computation
- Allow
C
to not be specified inmake_folds
None
- Increased tolerance for AUC vs CV-AUC
- Updated the internals of
measure_auc
to hew more closely toROCR
andcvAUC
, using computational tricks to speed up weighted AUC and EIF computation.
- Added tests for IPW AUC
- Added argument
cross_fitted_se
tocv_vim
andsp_vim
; this logical option allows the standard error to be estimated using cross-fitting. This can improve performance in cases where flexible algorithms are used to estimate the full and reduced regressions. - Added bootstrap-based standard error estimates as an option to both
vim
andcv_vim
; currently, this option is only available for non-sampled-split calls (i.e., withsample_splitting = FALSE
) - Updated sample-splitting behavior to match more closely with theoretical results (and improve power!): namely, that since estimation of the nuisance regression functions (i.e., the regression of outcome on all covariates and outcome on the reduced set of covariates) can be treated as fixed in making inference, sample-splitting is only necessary for evaluating predictiveness. Thus, the final regression functions from a call to
vim
are based on the entire dataset, while the full and reduced predictiveness (predictiveness_full
andpredictiveness_reduced
, along with the corresponding confidence intervals) is evaluated using separate portions of the data for the full and reduced regressions. - Added argument
sample_splitting
tovim
,cv_vim
andsp_vim
; ifFALSE
, sample-splitting is not used to estimate predictiveness. Note that we recommend using the default,TRUE
, in all cases, since inference usingsample_splitting = FALSE
will be invalid for variables with truly null variable importance. - Updated cross-fitting (also referred to as cross-validation) behavior within
sample_splitting = TRUE
to match more closely with theoretical results (and improve power!). In this case, we first split the data into$2K$ cross-fitting folds, and split these folds equally into two sample-splitting folds. For the nuisance regression using all covariates, for each$k \in {1, \ldots, K}$ we set aside the data in sample-splitting fold 1 and cross-fitting fold$k$ [this comprises$1 / (2K)$ of the data]. We train using the remaining observations [comprising$(2K-1)/(2K)$ of the data] not in this testing fold, and we test on the originally withheld data. We repeat for the nuisance regression using the reduced set of covariates, but withhold data in sample-splitting fold 2. This update affects bothcv_vim
andsp_vim
. Ifsample_splitting = FALSE
, then we use standard cross-fitting.
- Use
>=
in computing the numerator of AUC with inverse probability weights - Update
roxygen2
documentation for wrappers (vimp_*
) to inherit parameters and details fromcv_vim
(reduces potential for documentation mismatches)
None
- Automatically determine the
family
if it isn't specified; usestats::binomial()
if there are only two unique outcome values, otherwise usestats::gaussian()
None
- Update sensitivity and specificity to use weak inequalities rather than strict inequalities (better aligns with
cvAUC
) - Add a test of CV-AUC estimation against
cvAUC
- Borrow information across folds for empirically estimated quantities (e.g., the outcome variance or probability of a certain class); asymptotically equivalent to the prior procedure, but could result in small-sample differences
- Use fold-specific EIFs for cross-validated SE estimation (again, asymptotically equivalent to the prior procedure, but could result in small-sample differences)
None
- Allow the user to specify either an augmented inverse probability of coarsening (AIPW, the default) estimator in coarsened-at-random settings, or specify an IPW estimator, using new argument
ipc_est_type
(available invim
,cv_vim
, andsp_vim
; also corresponding wrapper functions for each VIM and corresponding internal estimation functions)
None
- Updated internals so that stratified estimation can be performed in outer regression functions for binary outcomes, but that in the case of two-phase samples the stratification won't be used in any internal regressions with continuous outcomes
- Updated internals to allow stratification on both the outcome and observed status, so that there are sufficient cases per fold for both the phase 1 and phase 2 regressions (only used with two-phase samples)
None
- Updated links to DOIs and package vignettes throughout
- Updated all tests in
testthat/
to useglm
rather thanxgboost
(increases speed) - Updated all examples to use
glm
rather thanxgboost
orranger
(increases speed, even though the regression is now misspecified for the truth) - Removed
forcats
from vignette
None
- Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the matrix of fold-specific EIFs had the wrong number of rows
- Changes to internals of
measure_accuracy
andmeasure_auc
for project-wide consistency - Update all tests in
testthat/
to not explicitly loadxgboost
None
- Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the EIF had the wrong number of rows
None
- Compute logit transforms using
stats::qlogis
andstats::plogis
rather than bespoke functions
None
- Bugfix from 2.1.1.1: compute the correction correctly
None
- Allow confidence interval (CI) and inverse probability of coarsening corrections on different scales (e.g., log) to ensure that estimates and CIs lie in the parameter space
- Compute one-step estimators of variable importance if inverse probability of censoring weights are entered. You input the weights, indicator of coarsening, and observed variables, and
vimp
will handle the rest.
- Created new vignettes "Types of VIMs" and "Using precomputed regression function estimates in
vimp
" - Updated main vignette to only use
run_regression = TRUE
for simplicity - Added argument
verbose
tosp_vim
; ifTRUE
, messages are printed throughout fitting that display progress andverbose
is passed toSuperLearner
- Change names of internal functions from
cv_predictiveness_point_est
andpredictiveness_point_est
toest_predictiveness_cv
andest_predictiveness
, respectively - Removed functions
cv_predictiveness_update
,cv_vimp_point_est
,cv_vimp_update
,predictiveness_update
,vimp_point_est
,vimp_update
; this functionality is now inest_predictiveness_cv
andest_predictiveness
(for the*update*
functions) or directly invim
orcv_vim
(for the*vimp*
functions) - Removed functions
predictiveness_se
andpredictiveness_ci
(functionality is now invimp_se
andvimp_ci
, respectively) - Changed
weights
argument toipc_weights
, clarifying that these weights are meant to be used as inverse probability of coarsening (e.g., censoring) weights
Added functions sp_vim
, sample_subsets
, spvim_ics
, spvim_se
; these allow computation of Shapely Population Variable Importance (SPVIM)
None
- Removed functions
sp_vim
and helper functionsrun_sl
,sample_subsets
,spvim_ics
,spvim_se
; these will be added in a future release - Removed function
cv_vim_nodonsker
, sincecv_vim
supersedes this function
- Modify examples to pass all CRAN checks
- Added new function
sp_vim
and helper functionsrun_sl
,sample_subsets
,spvim_ics
,spvim_se
; these functions allow computation of the Shapley Population Variable Importance Measure (SPVIM) - Both
cv_vim
andvim
now use an outer layer of sample splitting for hypothesis testing - Added new functions
vimp_auc
,vimp_accuracy
,vimp_deviance
,vimp_rsquared
vimp_regression
is now deprecated; usevimp_anova
instead- added new function
vim
; each variable importance function is now a wrapper function aroundvim
with thetype
argument filled in cv_vim_nodonsker
is now deprecated; usecv_vim
instead- each variable importance function now returns a p-value based on the (possibly conservative) hypothesis test against the null of zero importance (with the exception of
vimp_anova
) - each variable importance function now returns the estimates of the individual risks (with the exception of
vimp_anova
) - added new functions to compute measures of predictiveness (and cross-validated measures of predictiveness), along with their influence functions
- Return tibbles in cv_vim, vim, merge_vim, and average_vim
None
- Changed tests to handle
gam
package update by switching library toSL.xgboost
,SL.step
, andSL.mean
- Added small unit tests for internal functions
None
- Attempt to handle
gam
package update in unit tests
None
cv_vim
andcv_vim_nodonsker
now return the cross-validation folds used within the function
None
- users may now only specify a
family
for the top-level SuperLearner ifrun_regression = TRUE
; in call cases, the second-stage SuperLearner uses agaussian
family - if the SuperLearner chooses
SL.mean
as the best-fitting algorithm, the second-stage regression is now run using the original outcome, rather than the first-stage fitted values
- added function
cv_vim_nodonsker
, which computes the cross-validated naive estimator and the update on the same, single, validation fold. This does not allow for relaxation of the Donsker class conditions.
None
- added function
two_validation_set_cv
, which sets up folds for V-fold cross-validation with two validation sets per fold - changed the functionality of
cv_vim
: now, the cross-validated naive estimator is computed on a first validation set, while the update for the corrected estimator is computed using the second validation set (both created fromtwo_validation_set_cv
); this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator, while making sure that the initial CV naive estimator is not biased high (due to a higher R^2 on the training data)
None
None
- changed the functionality of
cv_vim
: now, the cross-validated naive estimator is computed on the training data for each fold, while the update for the corrected cross-validated estimator is computed using the test data; this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator
- removed function
vim
, replaced with individual-parameter functions - added function
vimp_regression
to match Python package cv_vim
now can compute regression estimators- renamed all internal functions; these are now
vimp_ci
,vimp_se
,vimp_update
,onestep_based_estimator
- edited vignette
- added unit tests
None
Bugfixes etc.