Attribute aggregation/transformation + plotting & evaluation analyses #34

glitt13 · 2024-12-19T22:35:02Z

An approach to aggregate and transform existing attribute data to create new attribute data
Additionally, create & save plots that visualize results and aid in algorithm performance

Additions

xssa_attrs_tform.yaml: The example configuration file describing how variables are aggregated and transformed.
fs_tfrm_attrs.py : this is the main processing script that calls functions inside tfrm_attr.py
tfrm_attr.py : contains new functions for the fs_proc package
test_tfrm_attr.py: unit tests
fs_attrs_miss.R: Reads in missing comid-attributes file sometimes generated during fs_tfrm_attrs.py and attempts to find missing attribute data. In case missing attribute data exist, this Rscript is called inside the fs_tfrm_attrs.py to attempt to acquire data and use it for attribute transformation.
principal component analysis based on predictor dataset and response variable, including a generation of PCA plot: pca_stdscaled_tfrm, plot_pca_stdscaled_tfrm, plot_pca_stdscaled_cumulative_var, std_pca_plot_path, functions comprehensively summarized in the plot_pca_save_wrap wrapper
random forest feature importance, all relevant functions comprehensively summarized in the save_feat_imp_fig_wrap wrapper
algorithm evaluation using the learning_curve analysis, including the AlgoEvalPlotLC class, with functions comprehensively summarized with the plot_learning_curve_save_wrap wrapper
Create & save predicted vs observed regression plot, comprehensively summarized in the plot_pred_vs_obs_wrap wrapper
Generate map of predicted response variables, with all relevant functions comprehensively summarized in the plot_map_pred_wrap wrapper
Generate map of best prediction across multiple datasets, with all relevant functions comprehensively summarized by the plot_best_algo_wrap wrapper
AGU 2024 analysis scripts: the ealstm analyses, such as /scripts/analysis/fs_proc_viz_best_ealstm.py and the more-formal /scripts/eval_ingest/ealstm/proc_ealstm_agu24.py plus associated config files in the ealstm/ directory
Created an updated algo training/testing evaluation script, fs_proc_algo_viz.py as an updated version of fs_proc_algo.py with new evaluation and plotting features

Removals

Changes

Converted scripts/config/attr_gen_camels.R from hard-coding into a generalizable form that uses the config file scripts/config/attr_gen_camels_config.yaml

Testing

Unit testing with test_tfrm_attr.py has been challenging to implement under a normal unittest package approach owing to a mysterious dask.dataframe as dd error. Implemented a work-around that partially tests this package by nixing most instances of using classes.

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Windows
Linux
Browser

Accessibility

Keyboard friendly
Screen reader friendly

Other

…s comid of NA

…ndlist

…ection() into standard processing

…script

…list

…ribute transforms

…at of comid

…tion

…package version.

…rmation functions

…t transformation script's documentation

…at strings just-in-case user doesn't use f'{dir_base}'

…ssing comids or variables have been identified, else write message that there could be an issue in the logic

…r.hydfab

… functions

…: return a gdf of comids and coords rather than just comids when querying nhdplus; Co-authored-by: Lauren Bolotin <[email protected]> Co-authored-by: Guy Litt <[email protected]>

…dard train/test/evaluation processing

…adapt the comid retrieval to also return the geometry; feat: add train/test split alternative using specific comids for testing

…ig_grid, but rather algo_config

…turn format from test_fs_retr_nhdp_comids

…yle theme (#33) * Create custom matplotlib stylesheet for RaFTS plots * Flip axes on scatter; change perf to pred for clarity * Change perf to pred for clarity * Read in mplstyle file directly from fs_algo * incorporate plotting functions into fs_perf_viz.py * Use functions for creating file output paths * Change perf_map to pred_map --------- Co-authored-by: glitt13 <[email protected]>

glitt13 added 30 commits November 1, 2024 12:49

Add alternate comid retrieval via sf geometry in case nwissite return…

7c0a582

…s comid of NA

merge upstream/main

44e1182

fix: add gage_id inside each loc_attrs df; fix: set fill=TRUE for rbi…

8a937ce

…ndlist

fix: add usgs_vars sublist to Retr_Params

3951059

feat: add a format checker on Retr_Params

8c7ed2c

feat: add attribute variable name checker, incorporate check_attr_sel…

339279c

…ection() into standard processing

feat: developing approach to transform attributes

df57202

feat: add cmd/config file capability to retrieving camels attributes …

e7d5e0f

…script

fix: update script to work with return of a data.table rather than a …

f5b01b7

…list

fix: address path/glue format issues

34efc2e

refactor: negligible change

96b01a9

feat: add parquet file read option based on check for comid in filename

3a7d4c1

doc: update fs_read_attr_comid documentation based on read_type

218eba0

doc: update yaml config files to jive with latest developments in att…

08b867b

…ribute transforms

feat: core functionality that aggregates & transforms attributes

a9b215e

refactor: move config file read out of for-loop; fix: ensure str form…

d61c9c2

…at of comid

fix: add error if Null vals returned following aggregation/transforma…

2ec53f1

…tion

feat: create file listing needed comid-attributes pairings

e2d79c1

doc: describe steps in creating transformed attributes; feat: update …

6cf7a7b

…package version.

feat: add attribute generation script for camels catchments

5d288c3

fix: remove deprecated wrapper function from tfrm_attr

c9bdad6

fix: resolve merge conflicts

35f0663

fix: change dask dataframe to eager evaluation

0ea2a92

feat: partially-created unit tests corresponding to attribute transfo…

329a8e1

…rmation functions

feat: convert missing comid/attrs scripts into functions; doc: augmen…

3e0f378

…t transformation script's documentation

fix: add in home_dir as optional part of attr config's directory form…

67398a3

…at strings just-in-case user doesn't use f'{dir_base}'

fix: add logic on whether a warning prints after first checking if mi…

b14c63a

…ssing comids or variables have been identified, else write message that there could be an issue in the logic

feat: add attribute config file parser function to R package proc.att…

60b776c

…r.hydfab

fix: address undefined objects in attr_cfig_parse

02de2b0

fix: remove duplicated attr_cfig_parse from package file

33d11ec

bolotinl and others added 27 commits November 27, 2024 11:14

Use existing functions for pulling info from attr config

3c0cc87

feat: adding attributes of interest file for ealstm analysis

ae0a05d

feat: adding PCA to agu script

d63d800

feat: add analysis dir to save directory structure

6ec6ba9

feat: create correlation analyses

170036a

fix: simplify attribute filtering in dask dfs

91def5d

feat: add principal component analysis to dataset characterization

9b93ead

feat: add figure importance plotting; feat: developing learning curve…

f436080

… functions

feat: add feature importance plot wrapper functional call

e65d539

feat: create the learning curve plotting for each trained algorithm

b400d39

merge Lauren's data viz for further usage

10a568e

feat: integrate bolotinl's geospatial & regression plotting; refactor…

bcbe77c

…: return a gdf of comids and coords rather than just comids when querying nhdplus; Co-authored-by: Lauren Bolotin <[email protected]> Co-authored-by: Guy Litt <[email protected]>

fix: update dataset preprocessing

f69fbba

refactor: Adapt to updated comid/geometry retrieval

56984fe

feat: Integrate visualization plotting for each dataset into the stan…

b90bdcb

…dard train/test/evaluation processing

feat: create a cross-comparison 'best' predictor analysis; refactor: …

8f3a738

…adapt the comid retrieval to also return the geometry; feat: add train/test split alternative using specific comids for testing

fix: modify best map plotting for AGU 2024 poster

a37faec

fix: non-multi param training should not access params from algo_conf…

de2bfe2

…ig_grid, but rather algo_config

fix: update function name change

c60bc4e

feat: all set for AGU24

ddbc2e9

fix: explicitly define arg names in AlgoTrainEval; fix: update new re…

5ff2f04

…turn format from test_fs_retr_nhdp_comids

feat: add a new 'metric' mapping for xSSA sobol' sensitivities

697eef5

fix: remove print message looking for objects that don't exist

56098da

fix: rename accidental base path inside std_eval_metrs_path()

ac3067a

doc: add documentation to fs_algo functions

374c8e5

fix: remove scratch analysis

f269094

glitt13 mentioned this pull request Dec 19, 2024

Attribute aggregation and transformation #31

Closed

21 tasks

glitt13 added 2 commits December 19, 2024 15:45

fix: remove hydroatlas vars from config file

59b25cf

fix: move printout confirming write after write happens

b4c87e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attribute aggregation/transformation + plotting & evaluation analyses #34

Attribute aggregation/transformation + plotting & evaluation analyses #34

glitt13 commented Dec 19, 2024

Attribute aggregation/transformation + plotting & evaluation analyses #34

Are you sure you want to change the base?

Attribute aggregation/transformation + plotting & evaluation analyses #34

Conversation

glitt13 commented Dec 19, 2024

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other