Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StatisticsAggregator API #2554

Merged
merged 61 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 58 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
e4585d8
added stats extractor parent component
Apr 29, 2024
1e1afca
added stats extractor based on sigma clipping
Apr 29, 2024
d54aeda
added cut of outliers
Apr 30, 2024
1ab0f1a
update docs
May 2, 2024
074bd45
formatted with black
May 2, 2024
d1611ec
added pass for __call__ function
May 2, 2024
fdbb796
Small commit for prototyping
May 7, 2024
76bb812
Removed unneeded functions
May 10, 2024
34902c4
added unit tests
May 3, 2024
d25f85d
added changelog
May 3, 2024
b5847f7
fix lint
May 3, 2024
620ada6
I altered the class variables to th evariance statistics extractor
May 23, 2024
4e275e7
added a container for mean variance images and fixed docustring
May 24, 2024
9e68234
I changed the container type for the StarVarianceExtractor
May 27, 2024
cda269e
fix pylint
May 31, 2024
23c0536
change __call__() to _extract()
May 31, 2024
5bd7cd9
minor renaming
May 31, 2024
ea3331b
use pytest.fixture for Extractors
May 31, 2024
9f61c56
reduce duplicated code of the call function
May 31, 2024
36ffc88
edit description of StatisticsContainer
TjarkMiener Jun 12, 2024
a5caed4
added feature to shift the extraction sequence
TjarkMiener Jun 12, 2024
51d98f9
fix boundary case for the last chunk
TjarkMiener Jun 12, 2024
5b15397
fix tests
TjarkMiener Jun 12, 2024
cb2a6fb
fix ruff
TjarkMiener Jun 12, 2024
a396d13
edit docstring
TjarkMiener Jun 17, 2024
a08b831
added check for length of dl1 table is greater than chunk size
TjarkMiener Jun 17, 2024
f20f1e3
fix typo
TjarkMiener Jun 17, 2024
1be72cc
edit docstring
TjarkMiener Jun 17, 2024
a072386
removed lstchain relic
TjarkMiener Jun 18, 2024
9c8090e
resolved remaining issues
TjarkMiener Jun 27, 2024
7ef82a5
added test for the input data
TjarkMiener Jun 27, 2024
09ed8dc
fill std_outliers also for the PlainExtractor
TjarkMiener Jun 27, 2024
3052f69
renamed the traitlets for outliers
TjarkMiener Jun 27, 2024
dbf6322
set field for extraction start and stop to NAN_TIME
TjarkMiener Jun 28, 2024
643da4c
bug fix for chunks creation
TjarkMiener Jun 28, 2024
6d7a9e5
bug fix treatment of last chunk and fix tests
TjarkMiener Jun 28, 2024
3087ff3
edit comments
TjarkMiener Jun 28, 2024
6147c0e
updated docs and tests
TjarkMiener Jul 1, 2024
537b092
properly name the fixture
TjarkMiener Jul 4, 2024
362d336
update doc strings and variable names
TjarkMiener Jul 4, 2024
864a061
use only one chunk iterator on the whole table
TjarkMiener Jul 9, 2024
2d0d37d
reformulate error messages for chunk size greater than dl1 table length
TjarkMiener Jul 9, 2024
8f4f58c
refactor _get_chunks for a better readability
TjarkMiener Jul 9, 2024
d09dc85
remove outlier detection from the stats extraction
TjarkMiener Jul 16, 2024
71c051e
simplify _get_chunks
TjarkMiener Jul 17, 2024
44ed0dc
generalized extract()
TjarkMiener Jul 17, 2024
f2541f6
fix tests
TjarkMiener Jul 17, 2024
b772dff
add test with inserting outliers
TjarkMiener Jul 17, 2024
16914b6
update docstring
TjarkMiener Jul 18, 2024
2a62ab3
return Table instead of dict
TjarkMiener Jul 18, 2024
0ef25c6
attach units (if any) to the stats table
TjarkMiener Jul 22, 2024
051e66d
fix random seed in tests
TjarkMiener Jul 25, 2024
3b6b76b
use np.testing.assert_allclose for tests
TjarkMiener Jul 25, 2024
bc4156b
added number of events for the stats extraction
TjarkMiener Jul 31, 2024
5a143dd
moved extractor to monitoring module
TjarkMiener Jul 31, 2024
cde6857
update docs for monitoring module
TjarkMiener Aug 1, 2024
8fe3507
fix docs
TjarkMiener Aug 1, 2024
5275708
fix docstring with double back-ticks
TjarkMiener Aug 1, 2024
fbab2d1
update docs and rename extractor to aggregator
TjarkMiener Aug 2, 2024
98daeed
fix typo
TjarkMiener Aug 2, 2024
ea058a0
removed last slash for intersphinx mapping
TjarkMiener Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/api-reference/monitoring/extractor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _stats_extractor:

********************
Statistics Extractor
********************


Reference/API
=============

.. automodapi:: ctapipe.monitoring.extractor
28 changes: 28 additions & 0 deletions docs/api-reference/monitoring/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
.. _monitoring:

**********************************
Monitoring (`~ctapipe.monitoring`)
**********************************

.. currentmodule:: ctapipe.monitoring

This module include all the functions and classes needed for the Monitoring of CTA data.
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved

Currently, only code related to :ref:`stats_extractor` is implemented here.


Submodules
==========

.. toctree::
:maxdepth: 1
:glob:

extractor


Reference/API
=============

.. automodapi:: ctapipe.monitoring
:no-inheritance-diagram:
1 change: 1 addition & 0 deletions docs/changes/2554.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add API to extract the statistics from a sequence of images.
28 changes: 25 additions & 3 deletions src/ctapipe/containers.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
"TriggerContainer",
"WaveformCalibrationContainer",
"StatisticsContainer",
"ImageStatisticsContainer",
"IntensityStatisticsContainer",
"PeakTimeStatisticsContainer",
"SchedulingBlockContainer",
Expand Down Expand Up @@ -412,7 +413,28 @@ class MorphologyContainer(Container):


class StatisticsContainer(Container):
"""Store descriptive statistics"""
"""Store descriptive statistics of a chunk of images"""

n_events = Field(-1, "number of events used for the extraction of the statistics")
mean = Field(
None,
"mean of a pixel-wise quantity for each channel"
"Type: float; Shape: (n_channels, n_pixel)",
)
median = Field(
None,
"median of a pixel-wise quantity for each channel"
"Type: float; Shape: (n_channels, n_pixel)",
)
std = Field(
None,
"standard deviation of a pixel-wise quantity for each channel"
"Type: float; Shape: (n_channels, n_pixel)",
)


class ImageStatisticsContainer(Container):
"""Store descriptive image statistics"""

max = Field(np.float32(nan), "value of pixel with maximum intensity")
min = Field(np.float32(nan), "value of pixel with minimum intensity")
Expand All @@ -422,11 +444,11 @@ class StatisticsContainer(Container):
kurtosis = Field(nan, "kurtosis of intensity")


class IntensityStatisticsContainer(StatisticsContainer):
class IntensityStatisticsContainer(ImageStatisticsContainer):
default_prefix = "intensity"


class PeakTimeStatisticsContainer(StatisticsContainer):
class PeakTimeStatisticsContainer(ImageStatisticsContainer):
default_prefix = "peak_time"


Expand Down
6 changes: 3 additions & 3 deletions src/ctapipe/image/statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import numpy as np
from numba import float32, float64, guvectorize, int64, njit

from ..containers import StatisticsContainer
from ..containers import ImageStatisticsContainer

__all__ = [
"arg_n_largest",
Expand Down Expand Up @@ -88,8 +88,8 @@ def kurtosis(data, mean=None, std=None, fisher=True):


def descriptive_statistics(
values, container_class=StatisticsContainer
) -> StatisticsContainer:
values, container_class=ImageStatisticsContainer
) -> ImageStatisticsContainer:
"""compute intensity statistics of an image"""
mean = values.mean()
std = values.std()
Expand Down
4 changes: 2 additions & 2 deletions src/ctapipe/image/tests/test_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,14 @@ def test_kurtosis():


def test_return_type():
from ctapipe.containers import PeakTimeStatisticsContainer, StatisticsContainer
from ctapipe.containers import ImageStatisticsContainer, PeakTimeStatisticsContainer
from ctapipe.image import descriptive_statistics

rng = np.random.default_rng(0)
data = rng.normal(5, 2, 1000)

stats = descriptive_statistics(data)
assert isinstance(stats, StatisticsContainer)
assert isinstance(stats, ImageStatisticsContainer)

stats = descriptive_statistics(data, container_class=PeakTimeStatisticsContainer)
assert isinstance(stats, PeakTimeStatisticsContainer)
Expand Down
171 changes: 171 additions & 0 deletions src/ctapipe/monitoring/extractor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
"""
Extraction algorithms to compute the statistics from a chunk of images
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
"""

__all__ = [
"StatisticsExtractor",
"PlainExtractor",
"SigmaClippingExtractor",
]

from abc import abstractmethod
from collections import defaultdict

import numpy as np
from astropy.stats import sigma_clipped_stats
from astropy.table import Table

from ctapipe.containers import StatisticsContainer
from ctapipe.core import TelescopeComponent
from ctapipe.core.traits import Int


class StatisticsExtractor(TelescopeComponent):
"""
Base component to handle the extraction of the statistics from a table
containing e.g. charges, peak times and/or charge variances (images).
"""

chunk_size = Int(
2500,
help="Size of the chunk used for the calculation of the statistical values",
).tag(config=True)

def __call__(
self,
table,
masked_pixels_of_sample=None,
chunk_shift=None,
col_name="image",
) -> Table:
"""
Divide table into chunks and extract the statistical values.

This function divides the input table into overlapping or non-overlapping chunks of size ``chunk_size``
and call the relevant function of the particular extractor to extract the statistical values.
The chunks are generated in a way that ensures they do not overflow the bounds of the table.
- If ``chunk_shift`` is None, extraction chunks will not overlap, but the last chunk is ensured to be
of size `chunk_size`, even if it means the last two chunks will overlap.
- If ``chunk_shift`` is provided, it will determine the number of samples to shift between the start
of consecutive chunks resulting in an overlap of extraction chunks. Chunks that overflows the bounds
of the table are not considered.

Parameters
----------
table : astropy.table.Table
table with images of shape (n_images, n_channels, n_pix)
and timestamps of shape (n_images, ) stored in an astropy Table
masked_pixels_of_sample : ndarray, optional
boolean array of masked pixels of shape (n_pix, ) that are not available for processing
chunk_shift : int, optional
number of samples to shift between the start of consecutive extraction chunks
col_name : string
column name in the table

Returns
-------
astropy.table.Table
table containing the start and end values as timestamps and event IDs
as well as the extracted statistics (mean, median, std) for each chunk
"""

# Check if the statistics of the table is sufficient to extract at least one chunk.
if len(table) < self.chunk_size:
raise ValueError(
f"The length of the provided table ({len(table)}) is insufficient to meet the required statistics for a single extraction chunk of size ({self.chunk_size})."
)
# Check if the chunk_shift is smaller than the chunk_size
if chunk_shift is not None and chunk_shift > self.chunk_size:
raise ValueError(
f"The chunk_shift ({chunk_shift}) must be smaller than the chunk_size ({self.chunk_size})."
)

# Function to split the table into appropriated chunks
def _get_chunks(table, chunk_shift):
# Calculate the range step: Use chunk_shift if provided, otherwise use chunk_size
step = chunk_shift or self.chunk_size

# Generate chunks that do not overflow
for i in range(0, len(table) - self.chunk_size + 1, step):
yield table[i : i + self.chunk_size]

# If chunk_shift is None, ensure the last chunk is of size chunk_size, if needed
if chunk_shift is None and len(table) % self.chunk_size != 0:
yield table[-self.chunk_size :]

# Calculate the statistics for each chunk of images
units = {col: table[col_name].unit for col in ("mean", "median", "std")}
data = defaultdict(list)
for chunk in _get_chunks(table, chunk_shift):
stats = self.extract(chunk[col_name].data, masked_pixels_of_sample)
data["time_start"].append(chunk["time_mono"][0])
data["time_end"].append(chunk["time_mono"][-1])
data["event_id_start"].append(chunk["event_id"][0])
data["event_id_end"].append(chunk["event_id"][-1])
data["n_events"].append(stats.n_events)
data["mean"].append(stats.mean)
data["median"].append(stats.median)
data["std"].append(stats.std)

return Table(data, units=units)

@abstractmethod
def extract(self, images, masked_pixels_of_sample) -> StatisticsContainer:
pass


class PlainExtractor(StatisticsExtractor):
"""
Extract the statistics from a chunk of images using numpy functions
"""

def extract(self, images, masked_pixels_of_sample) -> StatisticsContainer:
# Mask broken pixels
masked_images = np.ma.array(images, mask=masked_pixels_of_sample)

# Calculate the mean, median, and std over the chunk per pixel
pixel_mean = np.ma.mean(masked_images, axis=0).filled(np.nan)
pixel_median = np.ma.median(masked_images, axis=0).filled(np.nan)
pixel_std = np.ma.std(masked_images, axis=0).filled(np.nan)

return StatisticsContainer(
n_events=masked_images.shape[0],
mean=pixel_mean,
median=pixel_median,
std=pixel_std,
)


class SigmaClippingExtractor(StatisticsExtractor):
"""
Extract the statistics from a chunk of images using astropy's sigma clipping functions
"""

max_sigma = Int(
default_value=4,
help="Maximal value for the sigma clipping outlier removal",
).tag(config=True)
iterations = Int(
default_value=5,
help="Number of iterations for the sigma clipping outlier removal",
).tag(config=True)

def extract(self, images, masked_pixels_of_sample) -> StatisticsContainer:
# Mask broken pixels
masked_images = np.ma.array(images, mask=masked_pixels_of_sample)

# Calculate the mean, median, and std over the chunk per pixel
pixel_mean, pixel_median, pixel_std = sigma_clipped_stats(
masked_images,
sigma=self.max_sigma,
maxiters=self.iterations,
cenfunc="mean",
axis=0,
)

return StatisticsContainer(
n_events=masked_images.shape[0],
mean=pixel_mean,
median=pixel_median,
std=pixel_std,
)
Empty file.
Loading