Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix Lrnr_cv.R #422

Open
wants to merge 10 commits into
base: devel
Choose a base branch
from
Open

Bug fix Lrnr_cv.R #422

wants to merge 10 commits into from

Conversation

Larsvanderlaan
Copy link
Contributor

@Larsvanderlaan Larsvanderlaan commented Aug 12, 2023

Previously, the following code, which contains a Stack of one learner, outputted a data.table containing NULLs with some entries a list of some subset of predictions.

This bug is fixed here by replacing data.table(preds) with as.data.table(preds) in the .predict function of Lrnr_cv.
As a note here, I think we should move away from using data.table(object) unless we deliberately want to have a data.table containing lists (e..g, as with packed predictions). In other cases, as.data.table should be used.

n <- 500
W <- runif(n, -1 , 1)
Y <- rbinom(n, 1, plogis(W))
task <- sl3_Task$new(data.table(W,Y), covariates = "W", outcome = "Y") 
Lrnr_cv$new(Stack$new(Lrnr_glm$new() ))$train(task)$predict(task)

Previously, the following code, which contains a Stack of one learner, outputted a data.table containing NULLs with some entries a list of some subset of predictions.

This bug is fixed here by replacing data.table(preds) with as.data.table(preds)

n <- 500
W <- runif(n, -1 , 1)
Y <- rbinom(n, 1, plogis(W))
task <- sl3_Task$new(data.table(W,Y), covariates = "W", outcome = "Y")
Lrnr_cv$new(Stack$new(Lrnr_glm$new() ))$train(task)$predict(task)
@Larsvanderlaan
Copy link
Contributor Author

Larsvanderlaan commented Aug 12, 2023

Also, there are inconsistencies between the Lrnr_CV class method predict and the $base_predict method, which need to be addressed.

In the Lrnr_CV class, the following code is present within the .predict method:


 # don't convert to vector if learner is stack, as stack won't
      if ((ncol(predictions) == 1) && !inherits(self$params$learner, "Stack")) {
        predictions <- unlist(predictions)
      }

However, in $base_predict, the code is as follows:

if (!is.null(ncols) && (ncols == 1)) {
        predictions <- as.vector(predictions)
      }

If predictions is a data.table, then as.vector(predictions) would result in a list, not a vector, potentially leading to errors. This is good when predictions is a data.table with packed predictions but in settings where a numeric vector is expected, this is bad.

To address this issue without affecting current behavior much, I propose the following change:

if(!is.null(ncols) && (ncols == 1)) {
        if(is.data.table(predictions)) {
          # if a data.table of packed predictions, return a matrix.
          predictions <- as.matrix(predictions)
        }
        # if not packed predictions, return vector
        if(!inherits(predictions[[1]], "packed_predictions")) {
          predictions <- unlist(predictions)
        } 
      }

To preserve behavior, I try to output a numeric vector of predictions whenever possible.
The only exception is for packed predictions, where a matrix of the packed predictions is then outputted.
The following code handles this exception.

@Larsvanderlaan Larsvanderlaan changed the title Update Lrnr_cv.R Bug fix Lrnr_cv.R Aug 12, 2023
@Larsvanderlaan
Copy link
Contributor Author

Larsvanderlaan commented Aug 12, 2023

This was leading to errors with certain nested superlearners (Lrnr_sl).

As a reproducible example, the following nested superlearner previously had malformed predictions/errors in the outer-level NNLS meta-regression.

library(sl3)
library(data.table)
n <- 500
W <- runif(n, -1 , 1)
W2 <- runif(n, -1 , 1)
Y <- rbinom(n, 1, plogis(W))

task <- sl3_Task$new(data.table(W,W2, Y), covariates = c("W", "W2"), outcome = "Y")

sl_glm_oneway <- Lrnr_sl$new(learners = lapply(task$nodes$covariates, function(var) {
  var <- c(var)
  Lrnr_glm$new(  covariates = var, family = binomial())
}), metalearner = Lrnr_nnls$new())

# fine
data.table(sl_glm_oneway$train(task)$predict(task))
# previously errored
data.table(Lrnr_sl$new(sl_glm_oneway)$train(task)$predict(task))
 

@Larsvanderlaan
Copy link
Contributor Author

Im also going to fix some bugs with Lrnr_pooled_hazards and Lrnr_independent binomial in this pull request

@Larsvanderlaan
Copy link
Contributor Author

previously Lrnr_cv$new(Lrnr_independent_binomial) would error because packed predictions were getting unlished in Lrnr_cv$predict

To fix this, I changed

# don't convert to vector if learner is stack, as stack won't
      if ((ncol(predictions) == 1) && !inherits(self$params$learner, "Stack")) {
        predictions <- unlist(predictions)
      }

to

# don't convert to vector if learner is stack, as stack won't
      if ((ncol(predictions) == 1) && !inherits(self$params$learner, "Stack")) {
        # if packed_predictions dont unlist
        if(is.data.table(predictions)) predictions <- as.matrix(predictions)
        if(!inherits(predictions[[1]], "packed_predictions")) {
          predictions <- as.vector(predictions)
        }
      }

@rachaelvp
Copy link
Member

rachaelvp commented Aug 12, 2023

Thank you!! I found some issues w Lrnr_cv predictions recently (#404). Does this PR resolve this?

@Larsvanderlaan
Copy link
Contributor Author

Larsvanderlaan commented Aug 12, 2023

Thank you!! I found some issues w Lrnr_cv predictions recently (#404). Does this PR resolve this?

I don't think this affects that PR. I left a comment what I think the issue is. @rachaelvp

Added weights for GAM
@Larsvanderlaan
Copy link
Contributor Author

Added weights support for Lrnr_gam (I thought this was already supported.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants