Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up mesoscope TIFF splitting #473

Open
3 tasks
danielsf opened this issue Mar 23, 2022 · 1 comment
Open
3 tasks

Speed up mesoscope TIFF splitting #473

danielsf opened this issue Mar 23, 2022 · 1 comment

Comments

@danielsf
Copy link
Contributor

The new mesoscope TIFF splitting code merged in #464 (ticket #460) appears to be a factor of ~ 2 slower than the legacy code. This is probably not a show-stopper, but it would be nice to speed it up.

The likeliest way to speed it up would be to refactor the TimeSeriesSplitter class defined here

https://github.com/AllenInstitute/ophys_etl_pipelines/blob/main/src/ophys_etl/modules/mesoscope_splitting/tiff_splitter.py#L372

so that, instead of generating the HDF5 file for one (ROI, z) pair at a time, it can just loop through all of the (ROI, z) pairs associated with the ophys_session, writing all of the expected videos at once. The advantage is that, instead of having to loop through the time series TIFF file once for each video (i.e. for each call of write_output_file), it could loop through the TIFF file once, writing frames to the appropriate video files.

Tasks

  • Replace TimeSeriesSplitter.write_output_file with a method that accepts a dict mapping (i_roi, z) pairs to output file paths and then loops through the TIFF file once, writing frames from the TIFF to the appropriate output paths).

Validation

  • Make sure unit tests still pass (the unittests in tests/modules/mesoscope_splitting_cli simulate different real world use cases and check the actual output against expected output, so if those tests pass, everything should still work).
  • Verify that refactoring TimeSeriesSplitter.write_output_file to loop through the time series TIFF once for the entire ophys_session would, in fact, be faster than the current implementation. @danielsf has some real datasets that can be used for this test. If the code is not faster, it is probably not worth merging this ticket.
@danielsf
Copy link
Contributor Author

this might also be a good opportunity to experiment with the HDF5 compression options mentioned in #389

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant