-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ase dataset updates #622
Ase dataset updates #622
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Minor changes.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #622 +/- ##
==========================================
+ Coverage 56.98% 57.23% +0.25%
==========================================
Files 108 109 +1
Lines 10262 10287 +25
==========================================
+ Hits 5848 5888 +40
+ Misses 4414 4399 -15 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
* minor cleanup of lmbddatabase * ase dataset compat for unified trainer and cleanup * typo in docstring * key_mapping docstring * add stress to atoms_to_graphs.py and test * allow adding target properties in atoms.info * test using generic tensor property in ase_datasets * minor docstring/comments * handle stress in voigt notation in metadata guesser * handle scalar generic values in a2g * clean up ase dataset unit tests * allow .aselmdb extensions * fix minor bugs in lmdb database and update tests * make connect_db staticmethod * remove redundant methods and make some private * allow a list of paths in AseDBdataset * remove sprinkled print statement * remove deprecated transform kwarg * fix doctring typo * rename keys function * fix missing comma in tests * set default r_edges in a2g in AseDatasets to false * simple unit-test for good measure * call _get_row directly * [wip] allow string sids * raise a helpful error if AseAtomsAdaptor not available * remove db extension in filepaths * set logger to info level when trying to read non db files, remove print * set logging.debug to avoid saturating logs * Update documentation for dataset config changes This PR is intended to address #629 * Update atoms_to_graphs.py * Update test_ase_datasets.py * Update test_ase_datasets.py * Update test_atoms_to_graphs.py * Update test_atoms_to_graphs.py * case for explicit a2g_args None values * Update update_config() * Update utils.py * Update utils.py * Update ocp_trainer.py More helpful warning for debug mode * Update ocp_trainer.py * Update ocp_trainer.py * Update TRAIN.md * fix concatenating predictions * check if keys exist in atoms.info * Update test_ase_datasets.py * use list() to cast all batch.sid/fid * correctly stack predictions * raise error on empty datasets * raise ValueError instead of exception * code cleanup * rename get_atoms object -> get_atoms for brevity * revert to raise keyerror when data_keys are missing * cast tensors to list using tolist and vstack relaxation pos * remove r_energy, r_forces, r_stress and r_data_keys from test_dataset w use_train_settings * fix test_dataset key * fix test_dataset key! * revert to not setting a2g_args dataset keys * fix debug predict logic * support numpy 1.26 * fix numpy version * revert write_pos * no list casting on batch lists * pretty logging --------- Co-authored-by: Ethan Sunshine <[email protected]> Co-authored-by: Muhammed Shuaibi <[email protected]>
This PR includes updates in #630
Updates and fixes in
AtomsToGraphs
,AseAtomsDataset
objects, andLMDBDatabase
that make it easier to load and useAtoms
with arbitrary properties as ASE style DBs as datasets with the unified OCP trainer.AtomsToGraphs
r_stress
as config option to include stress in datar_data_keys
as config option to pass a list of target properties saved inAtoms.info
to be included in data (see example below)AseAtomsDataset
key_mapping
andtransforms
attribute according to changes in Unified OCP Trainer #520LMDBDatabase
OCPTrainer
list()
to allow string idsHere's a minimal example for using: