Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorData Refactor Expandable #1158

Merged
merged 22 commits into from
Aug 27, 2024
Merged

VectorData Refactor Expandable #1158

merged 22 commits into from
Aug 27, 2024

Conversation

mavaylon1
Copy link
Contributor

@mavaylon1 mavaylon1 commented Jul 31, 2024

Motivation

What was the reasoning behind this change? Please explain the changes briefly.
Fix #1153

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Does the PR clearly describe the problem and the solution?
  • Have you reviewed our Contributing Guide?
  • Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.90%. Comparing base (49a60df) to head (5567f04).
Report is 1 commits behind head on staging_expand.

Additional details and impacted files
@@                Coverage Diff                 @@
##           staging_expand    #1158      +/-   ##
==================================================
+ Coverage           88.89%   88.90%   +0.01%     
==================================================
  Files                  45       45              
  Lines                9834     9855      +21     
  Branches             2794     2803       +9     
==================================================
+ Hits                 8742     8762      +20     
  Misses                776      776              
- Partials              316      317       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mavaylon1
Copy link
Contributor Author

Note: Dataset of references will be handled with HDMF proxy append (PR tbd)

@mavaylon1
Copy link
Contributor Author

@rly I plan on wrapping this up today; however, I wanted to point out #1163. I saw that my implementation and also dimension labels omit the case of untyped, named datasets. They are both related enough in terms of the code required that we can tackle them in a single Issue/PR.

That being said, to prevent scope creep and because I believe this PR is solid enough to stand as a merge, let's aim to get this merged this week.

@mavaylon1
Copy link
Contributor Author

mavaylon1 commented Aug 14, 2024

Review Notes:
We already synced about the tests within PyNWB that failing in regards to DecompositionSeries. There is a ticket: NeurodataWithoutBorders/pynwb#1947

What is the logic of this PR:
The goal is to be phase 1 of having datasets be expandable by default for the HDF5 backend. This can be toggled off within the user facing write method.

  1. Within h5tools.py
  • Feed the new expandable parameter from write into write_builder, write_group, write_dataset, and list_fill. The reason we adjust list_fill is because that is what we use to write array data.
  • Within list_fill the update is having a boolean conditional for expandable. I did not want to override user defined maxshape from H5DataIO. If maxshape is not provided, then maxshape is set to matched_spec_shape. The logic of matched_spec_shape is handled in the objectmapper, but stored in the builder.
  1. Within Builders.py
    The idea relies on getting the shape from the schema which is in the spec. The builder needs have the shape from the spec for us to be able to use it for maxshape. Why? Let's say in the schema is its [None, 2]. That means in the second dimension I want the data to be 2 no matter what. The first dimension can be whatever size. If we do not use the shape from the spec, it will be [None, None] which will fail validation in the validator and just be not what the schema/or user defines/expects.

The update is to have the shape in the DatasetBuilder.

  1. Within objectmapper.py
    Here lies I would say is the most important part of the PR. I will go into that, but I will also make a few comments about what is NOT in here.

Recall what the objectmapper is for: map a container to a builder (visa versa). Within the build method, we need to make sure that the DatasetBuilders we create have the shape from the spec. This is done by __get_spec_info, which is a modified method for getting the dimension labels. I then added spec_shapes into the DatasetBuilder (see note).

The __get_spec_info is modifies __get_matched_dimension in a pretty small way. The logic of the function uses the shape anyways. I know just return the corresponding shape with the dimension labels. The other note is that it now supports the case where the shape is defined but the dimension labels are not.

Note: There is a spot where I do not update the DatasetBuilder. That is described in #1163. It will be phase 2.
4. Updated and New tests
I created a new test class Qux to easily have a tool with a variable shapes.

@mavaylon1
Copy link
Contributor Author

mavaylon1 commented Aug 14, 2024

Todo items:

  • Code Coverage
  • Your assumption on the spec in the object mapper is incorrect.
  • Create a staging branch and have this PR merge into that.

@mavaylon1 mavaylon1 changed the base branch from dev to staging_expand August 15, 2024 00:35
@mavaylon1 mavaylon1 marked this pull request as ready for review August 15, 2024 00:59
@mavaylon1 mavaylon1 merged commit ddd433a into staging_expand Aug 27, 2024
28 of 29 checks passed
@mavaylon1 mavaylon1 deleted the exp branch August 27, 2024 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Expandable datasets (HDF5)
1 participant