Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use patch_size instead of chunk_size as base shape for sampling #4

Merged
merged 9 commits into from
May 7, 2024

Conversation

fercer
Copy link
Collaborator

@fercer fercer commented Mar 29, 2024

This change allows zarrdataset to extract samples using as reference the patch_size instead of the input's chunk size.
Basically, chunks are now considered as multiples of patch_size and therefore patches can be extracted without separation.
This is helpful when using zarrdataset for inference on larger-than-memory inputs.

No changes are needed ImageBase class since it can handle loading adjacent chunks. This will increase the memory usage if the patch size is not multiple of the chunk size due to multiple chunks being loaded.

There is no impact on the multi-thread capability of zarrdataset to use multiple workers.
That is because each worker has its own handler to access the zarr file, and chunks can be read safely without collisions.

@fercer fercer self-assigned this Mar 29, 2024
@fercer
Copy link
Collaborator Author

fercer commented Apr 30, 2024

I'm resuming the conversation from PR #6 here @ClementCaporal.

I'll be working in the next step to solving #3 and #5, by adding a way to extract overlapped patches.
My first attempt would be adding a stride parameter to the PatchSampler class that would allow to extract overlapped patches when stride < patch size.

The following step would be to allow ImageBase objects to add padding to patches retrieved from the cache when the requested slice is bigger than the actual image size. This is the case of edge chunks that are commonly smaller than the rest of the chunks in the image.

Padding is necessary because torch's DataLoader expects all samples to be of the same shape to collate them.

@ClementCaporal
Copy link

Hello @fercer.

Thank you for the explanations. I will try this new implementation as soon as the overlap sample is ready. (I have my own small patch meanwhile)

Have a good day,

Clément

@fercer
Copy link
Collaborator Author

fercer commented May 2, 2024

Thanks for your contribution to improve ZarrDataset @ClementCaporal!
The fixes to allow sampling with padding for inference have been implemented in this PR.
I'll add this functionality into the documentation and create a notebook example before merging with main branch.

@fercer fercer merged commit 4afb0bd into main May 7, 2024
2 checks passed
@fercer fercer deleted the overlap_sampling branch May 7, 2024 21:11
patch_sampler = zds.PatchSampler(patch_size=patch_size, pad=pad, allow_incomplete_patches=True)
```

Create a dataset from the list of filenames. All those files should be stored within their respective group "0".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a typo here for group "0", should it be "4" in this example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this @ClementCaporal! I considered this change and added it to a recent PR #8 that addresses an incorrect sampling of masked regions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Nice!
I was starting to use masked regions on friday and started noticing strange behavior so I just have to pull now thanks to you!

Have a good week,

Clément

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants