Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify structure #17

Merged
merged 11 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 44 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ from datetime import datetime
from zoneinfo import ZoneInfo
from pynwb import NWBHDF5IO, NWBFile

session_description = "A session of data where PSTH was produced"
session_description = "A session of data where a PSTH structure was produced"
session_start_time = datetime.now(ZoneInfo("Asia/Ulaanbaatar"))
identifier = "a_session_identifier"
nwbfile = NWBFile(
Expand Down Expand Up @@ -95,7 +95,7 @@ Note that in the diagram above, the `milliseconds_from_event_to_first_bin` is ne
The `data` argument passed to the `BinnedAlignedSpikes` stores counts across all the event timestamps for each of the units. The data is a 3D array where the first dimension indexes the units, the second dimension indexes the event timestamps, and the third dimension indexes the bins where the counts are stored. The shape of the data is `(number_of_units`, `number_of_events`, `number_of_bins`).


The `event_timestamps` is used to store the timestamps of the events and should have the same length as the second dimension of `data`.
The `event_timestamps` argument is used to store the timestamps of the events and should have the same length as the second dimension of `data`. Note that the event_timestamps should not decrease or in other words the events are expected to be in ascending order in time.

The first dimension of `data` works almost like a dictionary. That is, you select a specific unit by indexing the first dimension. For example, `data[0]` would return the data of the first unit. For each of the units, the data is organized with the time on the first axis as this is the convention in the NWB format. As a consequence of this choice the data of each unit is contiguous in memory.

Expand All @@ -106,7 +106,7 @@ The following diagram illustrates the structure of the data for a concrete examp


### Linking to units table
One way to make the information stored in the `BinnedAlignedSpikes` object more useful is to indicate exactly which units or neurons the first dimension of the `data` attribute corresponds to. This is **optional but recommended** as it makes the data more interpretable and useful for future users. In NWB the units are usually stored in a `Units` [table](https://pynwb.readthedocs.io/en/stable/pynwb.misc.html#pynwb.misc.Units). To illustrate how to to create this link let's first create a toy `Units` table:
One way to make the information stored in the `BinnedAlignedSpikes` object more useful for future users is to indicate exactly which units or neurons the first dimension of the `data` attribute corresponds to. This is **optional but recommended** as it makes the data more meaningful and easier to interpret. In NWB the units are usually stored in a `Units` [table](https://pynwb.readthedocs.io/en/stable/pynwb.misc.html#pynwb.misc.Units). To illustrate how to to create this link let's first create a toy `Units` table:

```python
import numpy as np
Expand Down Expand Up @@ -177,70 +177,64 @@ binned_aligned_spikes = BinnedAlignedSpikes(

```

As with the previous example this can be then added to a processing module in an NWB file and written to disk using exactly the same code as before.
As with the previous example this can be then added to a processing module in an NWB file and then written to disk using exactly the same code as before.

### Storing data from multiple events together
In experiments where multiple stimuli are presented to a subject within a single session, it is often useful to store the aggregated spike counts from all events in a single object. For such cases, the `AggregatedBinnedAlignedSpikes` object is ideal. This object functions similarly to the `BinnedAlignedSpikes` object but is designed to store data from multiple events (e.g., different stimuli) together.

Since events may not occur the same number of times, an homogeneous data structure is not possible. Therefore the `AggregatedBinnedAlignedSpikes` object includes an additional variable, event_indices, to indicate which event each set of counts corresponds to. You can create this object as follows:
### Storing data from multiple conditions (i.e. multiple stimuli)
`BinnedAlignedSpikes` can also be used to store data that is aggregated across multiple conditions while at the same time keeping track of which condition each set of counts corresponds to. This is useful when you want to store the spike counts around multiple conditions (e.g., different stimuli, behavioral events, etc.) in a single structure. Since each condition may not occur the same number of times (e.g. different stimuli do not appear in the same frequency), an homogeneous data structure is not possible. Therefore an extra variable, `condition_indices`, is used to indicate which condition each set of counts corresponds to.


```python
from ndx_binned_spikes import AggregatedBinnedAlignedSpikes
from ndx_binned_spikes import BinnedAlignedSpikes

aggregated_binned_aligned_spikes = AggregatedBinnedAlignedSpikes(
binned_aligned_spikes = BinnedAlignedSpikes(
bin_width_in_milliseconds=bin_width_in_milliseconds,
milliseconds_from_event_to_first_bin=milliseconds_from_event_to_first_bin,
data=data, # Shape (number_of_units, aggregated_events_counts, number_of_bins)
timestamps=timestamps, # As many timestamps as the second dimension of data
event_indices=event_indices, # An index indicating which event each of the counts corresponds to
data=data, # Shape (number_of_units, number_of_events, number_of_bins)
timestamps=timestamps, # Shape (number_of_events,)
condition_indices=condition_indices, # Shape (number_of_events,)
condition_labels=condition_labels, # Shape (number_of_conditions,) or np.unique(condition_indices).size
)
```

The `aggregated_events_counts` represents the total number of repetitions for all the events being aggregated. For example, if data is being aggregated from two stimuli where the first stimulus appeared twice and the second appeared three times, the aggregated_events_counts would be 5.
Note that `number_of_events` here represents the total number of repetitions for all the conditions being aggregated. For example, if data is being aggregated from two stimuli where the first stimulus appeared twice and the second appeared three times, the `number_of_events` would be 5.

The `condition_indices` is an indicator vector that should be constructed so that `data[:, condition_indices == condition_index, :]` corresponds to the binned spike counts for the condition with the specified condition_index. You can retrieve the same data using the convenience method `binned_aligned_spikes.get_data_for_condition(condition_index)`.

The `event_indices` is an indicator vector that should be constructed so that `data[:, event_indices == event_index, :]` corresponds to the binned spike counts around the event with the specified event_index. You can retrieve the same data using the convenience method `aggregated_binned_aligned_spikes.get_data_for_event(event_index)`.
The `condition_labels` argument is optional and can be used to store the labels of the conditions. This is meant to help to understand the nature of the conditions

It's important to note that the timestamps must be in ascending order and must correspond positionally to the event indices and the second dimension of the data. If they are not, a ValueError will be raised. To help organize the data correctly, you can use the convenience method `AggregatedBinnedAlignedSpikes.sort_data_by_timestamps(data=data, timestamps=timestamps, event_indices=event_indices)`, which ensures the data is properly sorted. Here’s how it can be used:
It's important to note that the timestamps must be in ascending order and must correspond positionally to the condition indices and the second dimension of the data. If they are not, a ValueError will be raised. To help organize the data correctly, you can use the convenience method `BinnedAlignedSpikes.sort_data_by_event_timestamps(data=data, event_timestamps=event_timestamps, condition_indices=condition_indices)`, which ensures the data is properly sorted. Here’s how it can be used:

```python
sorted_data, sorted_timestamps, sorted_event_indices = AggregatedBinnedAlignedSpikes.sort_data_by_timestamps(data=data, timestamps=timestamps, event_indices=event_indices)
sorted_data, sorted_event_timestamps, sorted_condition_indices = BinnedAlignedSpikes.sort_data_by_event_timestamps(data=data, event_timestamps=event_timestamps, condition_indices=condition_indices)

aggregated_binned_aligned_spikes = AggregatedBinnedAlignedSpikes(
binned_aligned_spikes = BinnedAlignedSpikes(
bin_width_in_milliseconds=bin_width_in_milliseconds,
milliseconds_from_event_to_first_bin=milliseconds_from_event_to_first_bin,
data=sorted_data,
timestamps=sorted_timestamps,
event_indices=sorted_event_indices,
event_timestamps=sorted_event_timestamps,
condition_indices=sorted_condition_indices,
condition_labels=condition_labels
)
```

The same can be achieved by using the following script:

```python
sorted_indices = np.argsort(timestamps)
sorted_indices = np.argsort(event_timestamps)
sorted_data = data[:, sorted_indices, :]
sorted_timestamps = timestamps[sorted_indices]
sorted_event_indices = event_indices[sorted_indices]

aggregated_binned_aligned_spikes = AggregatedBinnedAlignedSpikes(
bin_width_in_milliseconds=bin_width_in_milliseconds,
milliseconds_from_event_to_first_bin=milliseconds_from_event_to_first_bin,
data=sorted_data,
timestamps=sorted_timestamps,
event_indices=sorted_event_indices,
)
sorted_event_timestamps = event_timestamps[sorted_indices]
sorted_condition_indices = condition_indices[sorted_indices]
```

#### Example of building an `AggregatedBinnedAlignedSpikes` object from scratch
#### Example of building an `BinnedAlignedSpikes` for two conditions

To better understand how this object works, let's consider a specific example. Suppose we have data for two distinct events (such as two different stimuli) and their associated timestamps, similar to the `BinnedAlignedSpikes` examples mentioned earlier:
To better understand how this object works, let's consider a specific example. Suppose we have data for two different stimuli and their associated timestamps:

```python
import numpy as np

# Two units and 4 bins
data_for_first_event = np.array(
data_for_first_stimuli = np.array(
[
# Unit 1
[
Expand All @@ -256,7 +250,7 @@ data_for_first_event = np.array(
)

# Also two units and 4 bins but this event appeared three times
data_for_second_event = np.array(
data_for_second_stimuli = np.array(
[
# Unit 1
[
Expand All @@ -273,41 +267,40 @@ data_for_second_event = np.array(
]
)

timestamps_first_event = [5.0, 15.0]
timestamps_second_event = [1.0, 10.0, 20.0]
timestamps_first_stimuli = [5.0, 15.0]
timestamps_second_stimuli = [1.0, 10.0, 20.0]
```

The way that we would build the data for the `AggregatedBinnedAlignedSpikes` object is as follows:
The way that we would build the data for the `BinnedAlignedSpikes` object is as follows:

```python
from ndx_binned_spikes import AggregatedBinnedAlignedSpikes
from ndx_binned_spikes import BinnedAlignedSpikes

bin_width_in_milliseconds = 100.0
milliseconds_from_event_to_first_bin = -50.0

data = np.concatenate([data_for_first_event, data_for_second_event], axis=1)
timestamps = np.concatenate([timestamps_first_event, timestamps_second_event])
event_indices = np.concatenate([np.zeros(2), np.ones(3)])
data = np.concatenate([data_for_first_stimuli, data_for_second_stimuli], axis=1)
event_timestamps = np.concatenate([timestamps_first_stimuli, timestamps_second_stimuli])
condition_indices = np.concatenate([np.zeros(2), np.ones(3)])
condition_labels = ["a", "b"]

sorted_data, sorted_timestamps, sorted_event_indices = AggregatedBinnedAlignedSpikes.sort_data_by_timestamps(data=data, timestamps=timestamps, event_indices=event_indices)
sorted_data, sorted_event_timestamps, sorted_condition_indices = BinnedAlignedSpikes.sort_data_by_event_timestamps(data=data, event_timestamps=event_timestamps, condition_indices=condition_indices)

aggregated_binned_aligned_spikes = AggregatedBinnedAlignedSpikes(
binned_aligned_spikes = BinnedAlignedSpikes(
bin_width_in_milliseconds=bin_width_in_milliseconds,
milliseconds_from_event_to_first_bin=milliseconds_from_event_to_first_bin,
data=sorted_data,
timestamps=sorted_timestamps,
event_indices=sorted_event_indices,
event_timestamps=sorted_event_timestamps,
condition_indices=sorted_condition_indices,
)
```

Then we can recover the original data by calling the `get_data_for_event` method:
Then we can recover the original data by calling the `get_data_for_condition` method:

```python
retrieved_data_for_first_event = aggregated_binned_aligned_spikes.get_data_for_stimuli(event_index=0)
np.testing.assert_array_equal(retrieved_data_for_first_event, data_for_first_event)
retrieved_data_for_first_stimuli = binned_aligned_spikes.get_data_for_condition(condition_index=0)
np.testing.assert_array_equal(retrieved_data_for_first_stimuli, data_for_first_stimuli)
```

The `AggregatedBinnedAlignedSpikes` object can be added to a processing module in an NWB file and written to disk using the same code as before. Plus, a region of the `Units` table can be linked to the `AggregatedBinnedAlignedSpikes` object in the same way as it was done for the `BinnedAlignedSpikes` object.

---
This extension was created using [ndx-template](https://github.com/nwb-extensions/ndx-template).
72 changes: 18 additions & 54 deletions spec/ndx-binned-spikes.extensions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ groups:
- neurodata_type_def: BinnedAlignedSpikes
neurodata_type_inc: NWBDataInterface
default_name: BinnedAlignedSpikes
doc: A data interface for binned spike data aligned to an event (e.g. a stimuli
doc: A data interface for binned spike data aligned to an event (e.g. a stimulus
or the beginning of a trial).
attributes:
- name: name
Expand All @@ -11,7 +11,8 @@ groups:
doc: The name of this container
- name: description
dtype: text
value: Spikes data binned and aligned to event timestamps.
value: Spikes data binned and aligned to the event timestamps of one or multiple
conditions.
doc: A description of what the data represents
- name: bin_width_in_milliseconds
dtype: float64
Expand All @@ -25,7 +26,7 @@ groups:
required: false
datasets:
- name: data
dtype: numeric
dtype: uint64
dims:
- num_units
- number_of_events
Expand All @@ -44,64 +45,27 @@ groups:
shape:
- null
doc: The timestamps at which the events occurred.
- name: units_region
neurodata_type_inc: DynamicTableRegion
doc: A reference to the Units table region that contains the units of the data.
quantity: '?'
- neurodata_type_def: AggregatedBinnedAlignedSpikes
neurodata_type_inc: NWBDataInterface
default_name: AggregatedBinnedAlignedSpikes
doc: A data interface for aggregated binned spike data aligned to multiple events.
The data for all the events is concatenated along the second dimension and a second
array, event_indices, is used to keep track of which event each row of the data
corresponds to.
attributes:
- name: name
dtype: text
value: BinnedAlignedSpikes
doc: The name of this container
- name: description
dtype: text
value: Spikes data binned and aligned to the timestamps of multiple events.
doc: A description of what the data represents
- name: bin_width_in_milliseconds
dtype: float64
doc: The length in milliseconds of the bins
- name: milliseconds_from_event_to_first_bin
dtype: float64
default_value: 0.0
doc: The time in milliseconds from the event to the beginning of the first bin.
A negative value indicatesthat the first bin is before the event whereas a positive
value indicates that the first bin is after the event.
required: false
datasets:
- name: data
dtype: numeric
dims:
- num_units
- number_of_events
- number_of_bins
shape:
- null
- null
- null
doc: The binned data. It should be an array whose first dimension is the number
of units, the second dimension is the total number of events of all stimuli,
and the third dimension is the number of bins.
- name: event_indices
dtype: int64
- name: condition_indices
dtype: uint64
dims:
- number_of_events
shape:
- null
doc: The index of the event that each row of the data corresponds to.
- name: timestamps
dtype: float64
doc: The index of the condition that each timestamps corresponds to (e.g. a stimulus
type, trial number, category, etc.).This is only used when the data is aligned
to multiple conditions
quantity: '?'
- name: condition_labels
dtype: text
dims:
- number_of_events
- number_of_conditions
shape:
- null
doc: The timestamps at which the events occurred.
doc: The labels of the conditions that the data is aligned to. The size of this
array should match the number of conditions. This is only used when the data
is aligned to multiple conditions. First condition is index 0, second is index
1, etc.
quantity: '?'
- name: units_region
neurodata_type_inc: DynamicTableRegion
doc: A reference to the Units table region that contains the units of the data.
Expand Down
Loading
Loading