Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second fix for NWB GUIDE nested converter stubbing #979

Closed
wants to merge 12 commits into from
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

### Bug fixes
* Fixed the default naming of multiple electrical series in the `SpikeGLXConverterPipe`. [PR #957](https://github.com/catalystneuro/neuroconv/pull/957)
* Fixed an issue when passing conversion options to a sub-converter (like the popular `SpikeGLXConverterPipe`) nested inside another `NWBConverter`. [PR #979](https://github.com/catalystneuro/neuroconv/pull/979)

### Improvements
* The `OpenEphysBinaryRecordingInterface` now uses `lxml` for extracting the session start time from the settings.xml file and does not depend on `pyopenephys` anymore. [PR #971](https://github.com/catalystneuro/neuroconv/pull/971)
Expand Down
23 changes: 19 additions & 4 deletions src/neuroconv/nwbconverter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import json
import warnings
from collections import Counter
from inspect import signature
from pathlib import Path
from typing import Dict, List, Literal, Optional, Tuple, Union

Expand Down Expand Up @@ -158,10 +159,24 @@ def create_nwbfile(self, metadata: Optional[dict] = None, conversion_options: Op

def add_to_nwbfile(self, nwbfile: NWBFile, metadata, conversion_options: Optional[dict] = None) -> None:
conversion_options = conversion_options or dict()
for interface_name, data_interface in self.data_interface_objects.items():
data_interface.add_to_nwbfile(
nwbfile=nwbfile, metadata=metadata, **conversion_options.get(interface_name, dict())
)
for interface_key, data_interface in self.data_interface_objects.items():
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(data_interface, NWBConverter):
subconverter_kwargs = dict(nwbfile=nwbfile, metadata=metadata)

# Certain subconverters fully expose control over their interfaces conversion options
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
# (such as iterator options, including progress bar details)
subconverter_keyword_arguments = list(signature(data_interface.add_to_nwbfile).parameters.keys())
if "conversion_options" in subconverter_keyword_arguments:
subconverter_kwargs["conversion_options"] = conversion_options.get(interface_key, None)
# Others do not, and instead expose simplified global keywords similar to a classic interface
else:
subconverter_kwargs.update(conversion_options.get(interface_key, dict()))

data_interface.add_to_nwbfile(**subconverter_kwargs)
else:
data_interface.add_to_nwbfile(
nwbfile=nwbfile, metadata=metadata, **conversion_options.get(interface_key, dict())
)

def run_conversion(
self,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,30 @@ class TestConverter(NWBConverter):
expected_session_start_time = datetime(2020, 11, 3, 10, 35, 10).astimezone()
self.assertNWBFileStructure(nwbfile_path=nwbfile_path, expected_session_start_time=expected_session_start_time)

def test_single_probe_spikeglx_converter_in_converter_preview(self):
class TestConverter(NWBConverter):
data_interface_classes = dict(SpikeGLX=SpikeGLXConverterPipe)

source_data = dict(SpikeGLX=dict(folder_path=str(SPIKEGLX_PATH / "Noise4Sam_g0")))
converter = TestConverter(source_data=source_data)

conversion_options = dict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Let me think a bit on this.

One question:
Do you expect the user to know that the conversion options of each of the sub interfaces are set with their stream names? if so, how how would they? Or are they supposed to be handled by automatic tools?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All information for specifying the conversion_options dictionary is available in the conversion_options_schema

So a user has several options:

a) Read the JSON schema returned by the highest-level converter.get_conversion_options_schema(). Not the easiest thing to do but this is the intended route (principle applies to source data schema and metadata schema too)

b) If they defined a custom NWBConverter to be nested inside another, they already know their own key references by definition

c) If they used a ConverterPipe or core-supported subconverter based on a ConverterPipe, then they could find the automatically generated key references via minimal syntax converter[subconverter_key].data_interface_objects.keys(). We could even expose a convenience retrieval for it.

The GUIDE of course simplifies all this by automating construction based on the schema structure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks a lot fo bearing with me so far.

Second question:
Can we just pass conversion_options to all the interfaces os a pipe? Here in this case, they will be the same, right? Is there a case where they will differ?

I know that they can differ in practice, like, one could theoretically want a different writing buffer or progress bar for one interface of a sub-converter but my question is: is there a place where it would be critical that they are separated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at bruker/miniscope for that style. There, one common option specifies stubbing. Which is mildly convenient for that but therefore the GUIDE does not support in-file progress updates for those converters, whereas it does for SpikeGLX (because we then have control over each TQDM bar for each data stream in the converter)

This is exactly the kind of question I think is fine for you to consider in the long-term; but for now we really just need this bug fixed and released for one of the more popular formats and uses on the GUIDE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then there is no need to generalize yet.

Move the modified add_to_nwb in this branch from to the SpikeGLX converter. This will fix your bug and will avoid us to modify one of the most central functions in the libraryfor a specific case. If we find a better way in the future then we only need to break the SipkeGLX converter.

Copy link
Collaborator

@h-mayorquin h-mayorquin Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a minimal modification if it modifies add_to_nwb of the NWBConverter. Modifying the converter that triggered the bug directly (SpikeGLX) is the minimal fix as I suggested. What's wrong with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What minimal fix to SpikeGLXConverter are you suggesting that satisfies all the other concerns about its broader use (in particular progress updates)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function that you added here add_to_nwb.

Add it the converters that need it. That's a localized change.

I thought that it was only spikeglx but it sems that ther is more of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main converter class that needs it.

As demonstrated in the test, this is for an NWBConverter that contains a SpikeGLXConverter.

There is no fix to the SpikeGLXConverter specifically outside of the general standardization of how to style subconverter conversion options, which is a broader question if it should be standardized at all, and if so which standard should be chosen.

I have added the change directly to the GUIDE in NeurodataWithoutBorders/nwb-guide#908 so I suppose this PR is not necessary if you would prefer to close it - just be aware that programmatic usage of NeuroConv without this fix will have the same issue going forward

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks. That gives me more time to think.

Yeah, I believe that the add_to_nwb and the run_conversion signatures of the converters should not different for common arguments (in this case, conversion options) so I would rather fix that.

I think we can add helper functions to specify common conversions options for all the interfaces of a converter but I want to push complexity outwards and not to NWBConverter.

SpikeGLX={"imec0.ap": {"stub_test": True}, "imec0.lf": {"stub_test": True}, "nidq": {"stub_test": True}}
)

nwbfile_path = self.tmpdir / "test_single_probe_spikeglx_converter_preview.nwb"
converter.run_conversion(nwbfile_path=nwbfile_path, conversion_options=conversion_options)

expected_session_start_time = datetime(2020, 11, 3, 10, 35, 10).astimezone()
self.assertNWBFileStructure(nwbfile_path=nwbfile_path, expected_session_start_time=expected_session_start_time)

with NWBHDF5IO(path=nwbfile_path) as io:
nwbfile = io.read()

assert nwbfile.acquisition["ElectricalSeriesAPImec0"].data.shape == (100, 384)
assert nwbfile.acquisition["ElectricalSeriesLFImec0"].data.shape == (100, 384)
assert nwbfile.acquisition["ElectricalSeriesNIDQ"].data.shape == (100, 9)


class TestMultiProbeSpikeGLXConverter(TestCase):
maxDiff = None
Expand Down
Loading