Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add optional lazy look-up of extractor properties #247

Open
2 tasks done
pauladkisson opened this issue Sep 20, 2023 · 10 comments
Open
2 tasks done

[Feature]: Add optional lazy look-up of extractor properties #247

pauladkisson opened this issue Sep 20, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@pauladkisson
Copy link
Member

pauladkisson commented Sep 20, 2023

Current Behavior

Every ImagingExtractor (IE) looks up all its properties (num_channels, num_frames, etc.) during __init__ and stores them as private attributes (self._num_channels, self._num_frames, etc.).

Issue

This is efficient if you want to look up the properties many times for one file, but inefficient for MultiImagingExtractor that only needs to look up properties for 1 of its many IE's.

Proposed Change

Add an option to the base ImagingExtractor class for precomputed_properties=True and then if you pre-compute the properties, I/O happens during __init__ and they get stored as attributes. But if precomputed_properties=False then I/O happens lazily and you can skip the consistency checks for MultiImagingExtractors.

See Also

See discussion here

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 20, 2023

This also can be counted as evidence than assembling multi-file formats using MultiImagingExtractor is a bad idea. We should consider whether the bad idea is this intead of adding more logic / complexity to the general abstractor class (which touches everything) to fix a wrong decsion.

Not saying that this makes this a bad idea but we should consider this angle.

@CodyCBakerPhD
Copy link
Member

CodyCBakerPhD commented Sep 20, 2023

This also can be counted as evidence than assembling multi-file formats using MultiImagingExtractor is a bad idea.

How else would we go about loading a single plane or channel of a multi-channel multi-plane format that splits its frames into X number of frames per file? (where it was very slow when X=1, and perhaps not ridiculously slow when the total number of all files is less than ~100)

@h-mayorquin
Copy link
Collaborator

The multi extractor is working as a wrapper for convenient IO. You will still have the IO without the wrapper.

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 20, 2023

Like _BrukerTiffSinglePlaneImagingExtractor is a good example.

https://github.com/catalystneuro/roiextractors/blob/da3ee2851fdad8ba495eb1436ef02d66de5ac1e1/src/roiextractors/extractors/tiffimagingextractors/brukertiffimagingextractor.py#L401-L470

This class is just an IO function disguissed as a class (and IO wrapper). You probably eliminated the attribute access because it was inefficient. You could just had used the logic of get_video inside of BrukerTiffSinglePlaneImagingExtractor get_video instead of using the MultiImagingExtractor approach.

That would probably be harder to mantain, a cost. But if we are propagating a lazy look up feature to ALL extractors well that is also a cost.

I think you approach you guys used there is fine, keep this IO wrapper class as a private / utility object. That way you get the benefits but do not need to propagate the costs anywhere else. I am less keen on propagating this behavior to the whole library because some extractor might be used the way you used _BrukerTiffSinglePlaneImagingExtractor.

@h-mayorquin
Copy link
Collaborator

I think it would be better to implement a general object that behaves similar to _BrukerTiffSinglePlaneImagingExtractor for classes that are meant to be composed / concatenated. The only method that it should have is basically get_video and num_frames as a property and they will be meant to be used as convenience IO wrappers to be composed by MultiImaging. That way, you don't leak this complexity anywhere else than where it is needed.

@pauladkisson
Copy link
Member Author

I don't really get the pushback here.

MultiImagingExtractor is a very general purpose object used to concatenate individual IE's by frames regardless of file format usu. bc the files are split into chunks in time. If lazy property look-up is better for MultiImagingExtractor then that is a relevant use case for all of the IE's since they all can (and many do) use MultiImagingExtractor.

If you think MultiImagingExtractor is a bad idea in general, then we should discuss that instead. And tbh it seemed pretty intuitive to me.

@pauladkisson
Copy link
Member Author

I think it would be better to implement a general object that behaves similar to _BrukerTiffSinglePlaneImagingExtractor for classes that are meant to be composed / concatenated. The only method that it should have is basically get_video and num_frames as a property and they will be meant to be used as convenience IO wrappers to be composed by MultiImaging. That way, you don't leak this complexity anywhere else than where it is needed.

That's an interesting idea. But it doesn't work as well for formats like ScanImage that can have multi-file or single-file structure. If MultiImagingExtractor can't call its IEs' property methods then it won't be able to stand alone, and you would have to inherit it separately for every format that wants to support the multi-file case. Not the end of the world, but definitely a cost.

@h-mayorquin
Copy link
Collaborator

if lazy property look-up is better for MultiImagingExtractor then that is a relevant use case for all of the IE's since they all can (and many do) use MultiImagingExtractor.

I disagree with this. The use case of lazy look up is when you use MultiImagingExtractor to build an extractor from many files to avoid inefficiencies as it was done in an ad-hoc way for _BrukerTiffSinglePlaneImagingExtractor above. Not all extractors are meant to be composed in that way and therefore the feature does not need to be as general. If we can avoid complexity in our most general object, we should. The more general the object the more wary we should be of adding extra functionality.

@h-mayorquin
Copy link
Collaborator

Do we have data for volumetric ScanImage in google drive for some conversion project?

That's an interesting idea. But it doesn't work as well for formats like ScanImage that can have multi-file or single-file structure.

Seems simple to me. Use the simple IO wrapper class for the multi-file case, expand the simple IO wrapper class to a full one for the single file case.

If MultiImagingExtractor can't call its IEs' property methods then it won't be able to stand alone, and you would have to inherit it separately for every format that wants to support the multi-file case. Not the end of the world, but definitely a cost.

MultiImagingExtractor should be able to call IE properties. The simple classes IO wrapper classes that you compose with MultiImagingExtractor are the ones that don't need those methods.

simple_private_wrapper_io_class -> add_the_rest_of_the_methods -> fully fleshed `ImagingExtractor` for the single file case
simple_private_wrapper_io_classs -> compose many with `MultiImagingExtractor` -> fully fleshed `ImaginExtractor` for the mlutiple file case.

This is confined and ad-hoc for every format, you don't propagate complexity outsides of where it is needed. You can adjust what's best for performance depending on the layout of the files (if there are many the io_wrapper can be more or less light) and can chose in general what's best per format instead of thinking on a general interface for the abstract class in advance. Once you have some examples, then you can generalize and propagate to the general class.

@pauladkisson
Copy link
Member Author

Do we have data for volumetric ScanImage in google drive for some conversion project?

Yes, it's under mouseV1-to-nwb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants