Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot test dataset incorrectly classified time series #19

Open
lindsayplatt opened this issue Jun 12, 2024 · 1 comment
Open

Plot test dataset incorrectly classified time series #19

lindsayplatt opened this issue Jun 12, 2024 · 1 comment
Labels
manuscript Something to do for the manuscript

Comments

@lindsayplatt
Copy link
Owner

Look at the time series for sites incorrectly classified. See if we are still happy with those

@lindsayplatt lindsayplatt added the manuscript Something to do for the manuscript label Jun 12, 2024
@lindsayplatt
Copy link
Owner Author

For the latest model, here are those time series for both the train and test data.

image

image

Code for finding those sites and time series
library(targets)
library(tidyverse)
library(sf)

source('5_DefineCharacteristics/src/visualize_attribute_distributions.R')
source('5_DefineCharacteristics/src/prep_attr_randomforest.R')

# Load the new model
rf_model <- tar_read(p5_rf_model_optimized)
rf_model_test_pred_results <- tar_read(p5_rf_testpreds) 

tar_load(p5_site_attr_rf_optimal)
tar_load(p5_site_attr)
rows_used <- as.numeric(names(rf_model$predicted))
incorrect <- which(rf_model$predicted != rf_model$y)
sites_wrong_rf <- left_join(p5_site_attr_rf_optimal[rows_used[incorrect],], p5_site_attr) %>% 
  select(site_no, site_category_fact)

rows_used_test <- as.numeric(rownames(rf_model_test_pred_results))
incorrect_test <- rf_model_test_pred_results %>% 
  mutate(row_num = rows_used_test) %>% 
  filter(site_category_fact != site_category_predicted) %>% pull(row_num)
sites_wrong_rf_test <- left_join(p5_site_attr_rf_optimal[incorrect_test,], p5_site_attr) %>% 
  select(site_no, site_category_fact)

tar_load(p3_ts_sc_qualified)
ts_sc_incorrect_sites <- p3_ts_sc_qualified %>% 
  filter(site_no %in% c(sites_wrong_rf$site_no, sites_wrong_rf_test$site_no)) %>% 
  mutate(rf_type = ifelse(site_no %in% sites_wrong_rf_test$site_no, 'Test', 'Train')) %>% 
  left_join(bind_rows(sites_wrong_rf, sites_wrong_rf_test)) %>% 
  mutate(year = year(dateTime))
  
ts_sc_incorrect_sites %>% 
  filter(rf_type == 'Test') %>% 
  ggplot(aes(x = dateTime, y = SpecCond, color = site_category_fact, group=year)) +
  geom_line() + 
  facet_wrap(vars(site_no), scales='free', ncol=2) +
  ggtitle('Incorrectly predicted sites, test data') +
  theme_bw()

ts_sc_incorrect_sites %>% 
  filter(rf_type == 'Train') %>% 
  ggplot(aes(x = dateTime, y = SpecCond, color = site_category_fact, group=year)) +
  geom_line() + 
  facet_wrap(vars(site_no), scales='free', ncol=4) +
  ggtitle('Incorrectly predicted sites, train data') +
  theme_bw()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
manuscript Something to do for the manuscript
Projects
None yet
Development

No branches or pull requests

1 participant