Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: developing algorithm training and evaluation module * fix: minor bug fixes with paths and np array values retrieval * feat: create initial fs_algo package * feat: contain all training/eval into a single class * feat: simplify evaluation file write and module import * feat: add basic unit testing for AlgoTrainEval class * feat: convert save dir structure creation and fsds dataset reader into modular functions inside package * Update README.md describe the unique dependencies * feat: simplify aspects of attribute organization and combining with metrics * feat: beginning to convert attribute wrangling into a class * feat: add algorithm configuration file * feat: established class for attribute configuration file, scripts functioning * feat: add verbose option * fix: update to warnings.warn() * feat: building out additional unit tests for AttrConfigAndVars class * chore: remove spaces * feat: add unit test for fs_read_attr_comid * feat: add UserWarnings and associated unit test * feat: add unit tests for _find_feat_srce_id, fs_retr_nhdp_comids and fix associated functions when behavior didn't follow expected behavior * feat: add unit test for fs_save_algo_dir_struct * feat: a basic unit test for _open_response_data_fsds * chore: simplify algo script based on functionality moved into fs_algo_train_eval module * doc: add sphinx documentation to _read_attr_config and fs_read_attr_comid * doc: add sphinx-formatted documentation to the functions in the fs_algo_train_eval module; feat: move some hard-coded variables into the algorithm config file * fix: changes vars to attrs in AlgoTrainEval arg * fix: added the new parameters that were hard-coded (test_size & seed) * fix: swapped the train/test fractions to appropriate printout order * feat: make sphinx documentation * fix: reinstall sphinx docs for fsds_proc * fix: remove unused path_camels * fix: remove unused references to path_camels * fix: update standard fsds_proc config files to create netcdf rather than csv; rename these files schema to config * doc: update config file documentation on preferred save_type * doc: update description of yaml file's dataset * fix: update config files with featureID and featureSource entries * fix: change vars to attrs based on package's object name change * fix: change logic to ensure config file read if dataset attribute read failed * feat: add a raw data input checker/corrector for cases when nwissite gage ids are missing the leading 0 * fix: changed path_data to represent the raw input files containing corrected nwissite USGS gage ids (leading zeros) * fix: added appropriate fillna for nwissite gage ids not needed to be corrected * fix: adjust path check for attributes instead of algo * doc: add descriptive notes on algo pre-processing and suggest future improvements for datasets not processed with fsds_proc with TODO * doc: simplify attr_config, change dir_attrs to dir_db_attrs * chore: add some additional hydroatlas and USGS NHD variables for consideration * chore: add updated attribute variables to config files, based on top 5 variables considered by Bolotin et al 2022 SI work * fix: add error handling when hydrofabric could not be downloaded for a given comid * fix: avoid index error generated from attr_ddf_sub.shape[0].compute() by simply performing attr_ddf_sub.compute() first, which is needed anyway * fix: change fs_read_attr_comid to return pd.DataFrame instead of dask df, and add checks ensuring 'value' data column being float type, check for no NA values present * feat: add NA drop prior to train/test split * feat: create a separate function that standarizes the algorithm file save path * doc: add documentation to the std_algo_path func * feat: create script to generate algo prediction data for testing * feat: generating predictions from trained algos under dev * feat: add processing of xssa locations, randomly selecting a subset to use for algo prediction * feat: develop algo prediction's config ingest, and determine paths to prediction locations and trained algos * feat: add config file path builder * feat: create metric prediction and write results to file * feat: build unit test for build_cfig_path() * feat: build unit test for build_cfig_path() * feat: add unit testsfor std_pred_path and _read_pred_comid; test coverage now at 92% * feat: add oob = True as default for RandomForestRegressor * feat: add hyperparameterization capability using grid search and associated unit tests * feat: add unit testing for train_eval() * chore: change algo config for testing out hyperparameterization * chore: add UserWarning category specification to warnings.warn * fix: algo config assignment accidentally only looked at first line of params * fix: make sure that hyperparameter key:value pairings contained inside dict, not list * fix: adjust unit test's algo_config formats to represent the issue of a dict of a list, which the list_to_dict() function then converts * fix: _check_attributes_exist now appropriately reports missing attributes and comids * fix: ensure algo and pipeline keys contain algo and pipeline object types in the grid search case * Update pkg/fs_algo/fs_algo/fs_algo_train_eval.py Co-authored-by: LaurenBolotin-NOAA <[email protected]> * Update pkg/fs_algo/fs_algo/fs_algo_train_eval.py Co-authored-by: LaurenBolotin-NOAA <[email protected]> * chore: Update README.md Rename proc_fsds to fsds_proc * fix: remove network hardcoding for lyrs in proc_attr_wrap call * fix: rename ext to fileext since ext is a pre-defined object * fix: change unit test use of ext to fileext * feat: experimenting with attribute grabbing * doc: revise function documentation for clarity * chore: rename fsds to fs in all python-related files and config files * chore: rename fsds_proc directory to fs_proc * chore: rename additional fsds to fs * chore: rename remaining fsds to fs * doc: minor change to install instructions of fs_proc * feat: add requirements for fs_algo package * feat: add requirements.yml for conda environment of fs_algo/fs_proc python packages * doc: add details on func for creating col_schema_df * feat: add nwissite gage id leading zero checker as automated step * fix: new line continuation in f-string messages related to nwis checker * fix: update local config path and example in script * doc: change install description for this package * fix: modify logical test on elif featureSource == nwissite * feat: update and add new unit testing that accommodates the check_fix_nwissite_gageids function * fix: update temp directory assignment to work with non-Unix systems * doc: minor adjustment for instructional example on running unit tests * Make the change match the exact repo name * Make changes match exact repo name * doc: minor changes that will be removed: comid loc lookup * fix: rename fsds to fs in files corresponding to proc.attr.hydfab R package * feat: update R package with name change of fsds to fs * chore: update fsds to fs in config files and R unit tests * doc: update README from fsds to fs in non-url instances * doc: Update README.md Update hyperlinks and descriptions with latest fsds to fs change, and OWP repo location. * Update README.md doc: minor path fix * chore: rename fsds_attrs_grab.R to fs_attrs_grab.R and add updated Rd documentation using fs instead of fsds * doc: update arg name change of ext to fileext * doc: remove commented out code and create delineations on code sections * doc: correct mis-spellings --------- Co-authored-by: LaurenBolotin-NOAA <[email protected]>
- Loading branch information