Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time series sl3 r and rolling cross validation #248

Open
Shafi2016 opened this issue Nov 7, 2019 · 8 comments
Open

time series sl3 r and rolling cross validation #248

Shafi2016 opened this issue Nov 7, 2019 · 8 comments
Assignees
Labels

Comments

@Shafi2016
Copy link

I want to apply time series rolling/cross validation. Though the data(washb_data) used below is not the times series. I am just assuming it as time series. so that we can make it reproducible and I shall be able to apply on my time series data. I am error getting same error with my actual time series data as well.
I have added one line code from your time series

folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)
Howver, when I reached sl_fit <- sl$train(washb_task). I get the following error. I don't know to fix it.

Error in set(private$.data, j = new_col_names, value = new_data) :
Supplied 570 items to be assigned to 1000 items of column 'd47fdc00-01a0-11ea-a044-4560ff6b69d1_Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code

The rest are your codes
library(data.table)
library(knitr)
library(kableExtra)
library(tidyverse)
library(origami)
library(SuperLearner)
library(sl3)

set.seed(7194)

load data set and take a peek

washb_data <- fread("https://raw.githubusercontent.com/tlverse/tlverse-data/master/wash-benefits/washb_data.csv",
stringsAsFactors = TRUE)

washb_data <- washb_data[1:1000 ,]
head(washb_data) %>%
kable(digits = 4) %>%
kableExtra:::kable_styling(fixed_thead = T) %>%
scroll_box(width = "100%", height = "300px")

specify the outcome and covariates

outcome <- "whz"
covars <- colnames(washb_data)[-which(names(washb_data) == outcome)]
folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

create the sl3 task

washb_task <- make_sl3_Task(
data = washb_data,
covariates = covars,
outcome = outcome, folds = folds
)

choose base learners

lrnr_glm <- make_learner(Lrnr_glm)
lrnr_mean <- make_learner(Lrnr_mean)
lrnr_glmnet <- make_learner(Lrnr_glmnet)

lrnr_ranger100 <- make_learner(Lrnr_ranger, num.trees = 100)
lrnr_hal_simple <- make_learner(Lrnr_hal9001, degrees = 1, n_folds = folds)
lrnr_gam <- Lrnr_pkg_SuperLearner$new("SL.gam")
lrnr_bayesglm <- Lrnr_pkg_SuperLearner$new("SL.bayesglm")

stack <- make_learner(
Stack,
lrnr_glm, lrnr_mean, lrnr_ranger100, lrnr_glmnet,
lrnr_gam, lrnr_bayesglm
)
metalearner <- make_learner(Lrnr_nnls)
screen_cor <- Lrnr_pkg_SuperLearner_screener$new("screen.corP")

which covariates are selected on the full data?

screen_cor$train(washb_task)
cor_pipeline <- make_learner(Pipeline, screen_cor, stack)
fancy_stack <- make_learner(Stack, cor_pipeline, stack)

we can visualize the stack

dt_stack <- delayed_learner_train(fancy_stack, washb_task)
plot(dt_stack, color = FALSE, height = "400px", width = "100%")
sl <- make_learner(Lrnr_sl,
learners = fancy_stack,
metalearner = metalearner
)

we can visualize the super learner

dt_sl <- delayed_learner_train(sl, washb_task)
plot(dt_sl, color = FALSE, height = "400px", width = "100%")

sl_fit <- sl$train(washb_task)
sl_preds <- sl_fit$predict()
head(sl_preds)

@Shafi2016
Copy link
Author

I get the same problem even with this sample codes of https://github.com/tlverse/sl3_lecture/blob/master/sl3_timeseries.Rmd

library(data.table)
library(origami)
library(sl3)
library(xts)

load data

data(bsds)

head(bsds)
#Create a time-series object:

tsdata<-xts(bsds$cnt, order.by=as.POSIXct(bsds$dteday))

#Visualize the time-series:

PerformanceAnalytics::chart.TimeSeries(tsdata, auto.grid = FALSE, main = "Count of total rental bikes")

#Final setup

folds = origami::make_folds(tsdata, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

covars <- "cnt"

outcome <- "cnt"

create the sl3 task and take a look at it

ts_uni_task <- sl3_Task$new(data = bsds, covariates = covars,

                        outcome = outcome, outcome_type = "continuous", folds=folds)

let's take a look at the sl3 task

n_ahead_param <- 2
lrnr_arima <- Lrnr_arima$new(n.ahead = n_ahead_param)
fit_arima <- lrnr_arima$train(ts_uni_task)

verify that the learner is fit

fit_arima$is_trained
pred_arima <- fit_arima$predict()

head(pred_arima)
lrnr_tsdyn_linear <- Lrnr_tsDyn$new(learner = "linear", m = 1,

                                n.ahead = n_ahead_param)

lrnr_tsdyn_setar <- Lrnr_tsDyn$new(learner = "setar", m = 1, model = "TAR",

                               n.ahead = n_ahead_param)

lrnr_tsdyn_lstar <- Lrnr_tsDyn$new(learner = "lstar", m = 1,

                               n.ahead = n_ahead_param)

lrnr_garch <- Lrnr_rugarch$new(n.ahead = n_ahead_param)

lrnr_expsmooth <- Lrnr_expSmooth$new(n.ahead = n_ahead_param)

lrnr_harmonicreg <- Lrnr_HarmonicReg$new(n.ahead = n_ahead_param, K = 7,

                                     freq = 105)

ts_stack <- Stack$new(lrnr_arima, lrnr_tsdyn_linear, lrnr_tsdyn_setar,

                  lrnr_tsdyn_lstar)

ts_stack_fit <- ts_stack$train(ts_uni_task)

ts_stack_preds <- ts_stack_fit$predict()
Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Failed on predict
Error in self$compute_step() :
Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

@jeremyrcoyle
Copy link
Collaborator

There seems to be a recent bug in sl3 that prevents time series super learner from working correctly. Thanks for reporting this. We'll get it fixed ASAP

@Shafi2016
Copy link
Author

Thank you so much!! I shall be desperately waiting for the new update on it. The problem seems to be related to data.table.

@Shafi2016
Copy link
Author

Do you have any update on the above-mentioned problem?

@imalenica
Copy link
Member

Hi- sorry for the delay. I was able to fix it, and will be pushing the updated version in the next few days (I need to check other CVs as well).

@Shafi2016
Copy link
Author

Hello Ivana Malenica,
Thanks alot! This is a great news. I hope we will get updated version soon.

@jeremyrcoyle
Copy link
Collaborator

This should now be fixed on devel. You can install the devel version by doing install_github("tlverse/sl3@devel"). It will be merged up to master shortly.

@Shafi2016
Copy link
Author

First of all, I removed old version of sl3 and reinstall it using the link you provided. I checked again using the my own data/codes and this example https://github.com/tlverse/sl3_lecture/blob/master/sl3_timeseries.Rmd.
When I reached to this line of codes ts_stack_preds <- ts_stack_fit$predict().

I still get the same problem. Am I making any mistake.?

Thanks in Advance.

Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Failed on predict
Error in self$compute_step() :
Error in set(learner_preds, j = current_names, value = current_preds) :
Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants