time series sl3 r and rolling cross validation #248

Shafi2016 · 2019-11-07T21:37:54Z

I want to apply time series rolling/cross validation. Though the data(washb_data) used below is not the times series. I am just assuming it as time series. so that we can make it reproducible and I shall be able to apply on my time series data. I am error getting same error with my actual time series data as well.
I have added one line code from your time series

folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)
Howver, when I reached sl_fit <- sl$train(washb_task). I get the following error. I don't know to fix it.

Error in set(private$.data, j = new_col_names, value = new_data) :
Supplied 570 items to be assigned to 1000 items of column 'd47fdc00-01a0-11ea-a044-4560ff6b69d1_Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code

The rest are your codes
library(data.table)
library(knitr)
library(kableExtra)
library(tidyverse)
library(origami)
library(SuperLearner)
library(sl3)

set.seed(7194)

load data set and take a peek

washb_data <- fread("https://raw.githubusercontent.com/tlverse/tlverse-data/master/wash-benefits/washb_data.csv",
stringsAsFactors = TRUE)

washb_data <- washb_data[1:1000 ,]
head(washb_data) %>%
kable(digits = 4) %>%
kableExtra:::kable_styling(fixed_thead = T) %>%
scroll_box(width = "100%", height = "300px")

specify the outcome and covariates

outcome <- "whz"
covars <- colnames(washb_data)[-which(names(washb_data) == outcome)]
folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

create the sl3 task

washb_task <- make_sl3_Task(
data = washb_data,
covariates = covars,
outcome = outcome, folds = folds
)

choose base learners

lrnr_glm <- make_learner(Lrnr_glm)
lrnr_mean <- make_learner(Lrnr_mean)
lrnr_glmnet <- make_learner(Lrnr_glmnet)

lrnr_ranger100 <- make_learner(Lrnr_ranger, num.trees = 100)
lrnr_hal_simple <- make_learner(Lrnr_hal9001, degrees = 1, n_folds = folds)
lrnr_gam <- Lrnr_pkg_SuperLearner$new("SL.gam")
lrnr_bayesglm <- Lrnr_pkg_SuperLearner$new("SL.bayesglm")

stack <- make_learner(
Stack,
lrnr_glm, lrnr_mean, lrnr_ranger100, lrnr_glmnet,
lrnr_gam, lrnr_bayesglm
)
metalearner <- make_learner(Lrnr_nnls)
screen_cor <- Lrnr_pkg_SuperLearner_screener$new("screen.corP")

which covariates are selected on the full data?

screen_cor$train(washb_task)
cor_pipeline <- make_learner(Pipeline, screen_cor, stack)
fancy_stack <- make_learner(Stack, cor_pipeline, stack)

we can visualize the stack

dt_stack <- delayed_learner_train(fancy_stack, washb_task)
plot(dt_stack, color = FALSE, height = "400px", width = "100%")
sl <- make_learner(Lrnr_sl,
learners = fancy_stack,
metalearner = metalearner
)

we can visualize the super learner

dt_sl <- delayed_learner_train(sl, washb_task)
plot(dt_sl, color = FALSE, height = "400px", width = "100%")

sl_fit <- sl$train(washb_task)
sl_preds <- sl_fit$predict()
head(sl_preds)

Shafi2016 · 2019-11-09T00:22:23Z

I get the same problem even with this sample codes of https://github.com/tlverse/sl3_lecture/blob/master/sl3_timeseries.Rmd

library(data.table)
library(origami)
library(sl3)
library(xts)

load data

data(bsds)

head(bsds)
#Create a time-series object:

tsdata<-xts(bsds$cnt, order.by=as.POSIXct(bsds$dteday))

#Visualize the time-series:

PerformanceAnalytics::chart.TimeSeries(tsdata, auto.grid = FALSE, main = "Count of total rental bikes")

#Final setup

folds = origami::make_folds(tsdata, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

covars <- "cnt"

outcome <- "cnt"

create the sl3 task and take a look at it

ts_uni_task <- sl3_Task$new(data = bsds, covariates = covars,

                        outcome = outcome, outcome_type = "continuous", folds=folds)

let's take a look at the sl3 task

n_ahead_param <- 2
lrnr_arima <- Lrnr_arima$new(n.ahead = n_ahead_param)
fit_arima <- lrnr_arima$train(ts_uni_task)

verify that the learner is fit

fit_arima$is_trained
pred_arima <- fit_arima$predict()

head(pred_arima)
lrnr_tsdyn_linear <- Lrnr_tsDyn$new(learner = "linear", m = 1,

                                n.ahead = n_ahead_param)

lrnr_tsdyn_setar <- Lrnr_tsDyn$new(learner = "setar", m = 1, model = "TAR",

                               n.ahead = n_ahead_param)

lrnr_tsdyn_lstar <- Lrnr_tsDyn$new(learner = "lstar", m = 1,

                               n.ahead = n_ahead_param)

lrnr_garch <- Lrnr_rugarch$new(n.ahead = n_ahead_param)

lrnr_expsmooth <- Lrnr_expSmooth$new(n.ahead = n_ahead_param)

lrnr_harmonicreg <- Lrnr_HarmonicReg$new(n.ahead = n_ahead_param, K = 7,

                                     freq = 105)

ts_stack <- Stack$new(lrnr_arima, lrnr_tsdyn_linear, lrnr_tsdyn_setar,

                  lrnr_tsdyn_lstar)

ts_stack_fit <- ts_stack$train(ts_uni_task)

ts_stack_preds <- ts_stack_fit$predict()
Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Failed on predict
Error in self$compute_step() :
Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

jeremyrcoyle · 2019-11-11T15:52:34Z

There seems to be a recent bug in sl3 that prevents time series super learner from working correctly. Thanks for reporting this. We'll get it fixed ASAP

Shafi2016 · 2019-11-11T15:57:18Z

Thank you so much!! I shall be desperately waiting for the new update on it. The problem seems to be related to data.table.

Shafi2016 · 2019-11-20T18:21:23Z

Do you have any update on the above-mentioned problem?

imalenica · 2019-11-20T19:19:39Z

Hi- sorry for the delay. I was able to fix it, and will be pushing the updated version in the next few days (I need to check other CVs as well).

Shafi2016 · 2019-11-20T19:23:59Z

Hello Ivana Malenica,
Thanks alot! This is a great news. I hope we will get updated version soon.

jeremyrcoyle · 2019-12-04T22:47:25Z

This should now be fixed on devel. You can install the devel version by doing install_github("tlverse/sl3@devel"). It will be merged up to master shortly.

Shafi2016 · 2019-12-05T22:20:00Z

First of all, I removed old version of sl3 and reinstall it using the link you provided. I checked again using the my own data/codes and this example https://github.com/tlverse/sl3_lecture/blob/master/sl3_timeseries.Rmd.
When I reached to this line of codes ts_stack_preds <- ts_stack_fit$predict().

I still get the same problem. Am I making any mistake.?

Thanks in Advance.

Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Failed on predict
Error in self$compute_step() :
Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

imalenica added the bug label Nov 9, 2019

jeremyrcoyle assigned jeremyrcoyle and imalenica Nov 11, 2019

jeremyrcoyle mentioned this issue Dec 4, 2019

Meta-learning with time-series CV #254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time series sl3 r and rolling cross validation #248

time series sl3 r and rolling cross validation #248

Shafi2016 commented Nov 7, 2019

Shafi2016 commented Nov 9, 2019

jeremyrcoyle commented Nov 11, 2019

Shafi2016 commented Nov 11, 2019

Shafi2016 commented Nov 20, 2019

imalenica commented Nov 20, 2019

Shafi2016 commented Nov 20, 2019

jeremyrcoyle commented Dec 4, 2019

Shafi2016 commented Dec 5, 2019

time series sl3 r and rolling cross validation #248

time series sl3 r and rolling cross validation #248

Comments

Shafi2016 commented Nov 7, 2019

load data set and take a peek

specify the outcome and covariates

create the sl3 task

choose base learners

which covariates are selected on the full data?

we can visualize the stack

we can visualize the super learner

Shafi2016 commented Nov 9, 2019

load data

create the sl3 task and take a look at it

let's take a look at the sl3 task

verify that the learner is fit

jeremyrcoyle commented Nov 11, 2019

Shafi2016 commented Nov 11, 2019

Shafi2016 commented Nov 20, 2019

imalenica commented Nov 20, 2019

Shafi2016 commented Nov 20, 2019

jeremyrcoyle commented Dec 4, 2019

Shafi2016 commented Dec 5, 2019