-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve safe chunk validation #9527
Improve safe chunk validation #9527
Conversation
for more information, see https://pre-commit.ci
Co-authored-by: Maximilian Roos <[email protected]>
…unk and raise a proper error based on the mode selected, it is also possible to use the auto region detection with the mode "a"
for more information, see https://pre-commit.ci
…nto improve-safe-chunk-validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good @josephnowak ! The tests look great.
I left a couple of small comments and will have another read through of the code later, but overall 👍 . Thank you.
…ameter in order on the extract_zarr_variable_encoding method, raise the correct error if the border size is smaller than the zchunk on mode equal to r+
…ameter in order on the extract_zarr_variable_encoding method, raise the correct error if the border size is smaller than the zchunk on mode equal to r+
Couple of typing errors, then we can merge! |
Hi @max-sixty, sorry for being insistent with the review but I would like to avoid any misunderstood with the validation logic and affect the usability of Xarray. The following case is going to raise an error using the mode "r+": arr = xr.DataArray(
list(range(10)), dims=["a"], coords={"a": list(range(10))}, name="foo"
).chunk(a=3)
arr.to_zarr(store, mode="w")
with pytest.raises(ValueError):
# It is not possible to write on a chunk that is not full, even when it is the last one which can be of a smaller size
arr.isel(a=slice(9, 10)).to_zarr(store, region="auto", mode="r+") Is it the expected behavior? if it is, it probably would be good to make "a" the default mode, because it allows partial writes on the first and last chunk inside the region which I think is more convenient in more scenarios |
(tbc, thank you very much for being insistent, this is a complicated and important problem which we haven't spent enough time on, so need exactly this sort of help!)
OK, that's a very good point — that writing to the full final chunk will raise an error with the proposed code. The rule should be that partial chunk writes are not allowed, but that is a full chunk write, so should be allowed. Otherwise people can't write to the full array! Could we adjust the logic so writing to the final chunk is allowed?
I weigh the cost of unsafe writes very highly, so my initial impulse is that writing to partial chunks should fail by default. Possibly making this strict could break some existing code; possibly we should explain ourselves better. For users who want to write to a single partial chunk from a single process, it would be a fairly small change for people to pass But open to feedback from others; CC @pydata/xarray. One compromise would be to have a new mode, like |
for more information, see https://pre-commit.ci
if not safe_chunks: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could do an early return outside this loop
OK, very nice! (I haven't re-reviewed all the code, just the most recent commit). There are still a couple of type errors; lmk if you need a hand with those. |
…nto improve-safe-chunk-validation # Conflicts: # xarray/tests/test_backends.py
Yes, I can change the logic to avoid raising an error, I will have to send the shape of the Zarr array to validate if the region is covering the last chunk.
I have already push
I will check the type error, I thought that it was related to the code of someone else |
Great! I'll merge, and if anyone has feedback we can make any adjustments later. Thank you very much @josephnowak ! |
Thanks Max and Joseph! These checks will save Xarray users a lot of grief
…On Sat, Sep 21, 2024 at 6:30 PM Maximilian Roos ***@***.***> wrote:
Merged #9527 <#9527> into main.
—
Reply to this email directly, view it on GitHub
<#9527 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVV5RUSTENY4ZNH3GCTZXYMU5AVCNFSM6AAAAABOSZLXESVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGM2TAOJRGQZDOMA>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
* fix safe chunks validation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix safe chunks validation * Update xarray/tests/test_backends.py Co-authored-by: Maximilian Roos <[email protected]> * The validation of the chunks now is able to detect full or partial chunk and raise a proper error based on the mode selected, it is also possible to use the auto region detection with the mode "a" * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * The test_extract_zarr_variable_encoding does not need to use the region parameter * Inline the code of the allow_partial_chunks and end, document the parameter in order on the extract_zarr_variable_encoding method, raise the correct error if the border size is smaller than the zchunk on mode equal to r+ * Inline the code of the allow_partial_chunks and end, document the parameter in order on the extract_zarr_variable_encoding method, raise the correct error if the border size is smaller than the zchunk on mode equal to r+ * Now the mode r+ is able to update the last chunk of Zarr even if it is not "complete" * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Now the mode r+ is able to update the last chunk of Zarr even if it is not "complete" * Add a typehint to the modes to avoid issues with mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Maximilian Roos <[email protected]>
Unfortunately this error seems to be triggered incorrectly in some cases: #9557 Please take a look if you have the time. |
@max-sixty Here is the PR with all the features requested #9513 (comment), also I added more tests and improved the logic of the validation algorithm, I think now it is much more simple and covers certain cases that were not validated before, for example, the "r+" mode only allows full chunk writes even on the last chunk.
whats-new.rst
api.rst