-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter based on additional metadata fields #948
Conversation
add in info fields and limits
@@ -34,8 +33,11 @@ | |||
T_co = TypeVar("T_co", covariant=True) | |||
|
|||
|
|||
class DatasetMetadata(NamedTuple): | |||
natoms: ArrayLike | None = None | |||
class DatasetMetadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why even have this class if we can just dump random things into it? might as well just make it a dict? not sure if this will break alot of things lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to keep it as a NamedTuple or a dataclass and fields that we decide are needed. This is a much more disciplined interface, so that we are not just dumping random things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline and agreed to follow up after this PR and figure out a better way to include Metadata inside of dataset object and make it fast enough to filter on.
Is there a way to keep this info self-contained within the dataset files themselves? Every time we have another file that needs to be kept consistent. This functionality is pretty close to what the ASE datasets already allow (filtering by natoms, or other fields in the object). |
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me
Extend metadata with additional fields present in the NPZ metadata file. Instead of just taking 'natoms' take everything that is present.
Allow user to specify a filter on the dataset in the format of
Take all indices that have absolute value of 'some_quantity' less then or equal to 5
Take all indices that a value of 'some_quantity' in [x,y,z]