Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving quantile #426

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions benchmark/benchmark_time.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,20 @@ def benchmark_simple_moving_average(runner):
)


def benchmark_moving_quantile(runner):
runner.add_separator()
for n in [100, 10_000, 1_000_000]:
ds = _build_toy_dataset(n)

node = ds.node()
output = node.moving_quantile(window_length=10.0, quantile=0.5)

runner.benchmark(
f"moving_quantile (0.5):{n:_}",
lambda: tp.run(output, input={node: ds}),
)


def benchmark_moving_minimum(runner):
runner.add_separator()
for n in [1_000_000, 10_000_000]:
Expand Down Expand Up @@ -457,6 +471,7 @@ def main():
"add_index_v2",
"from_pandas_with_objects",
"moving_minimum",
"moving_quantile",
]
if args.functions is not None:
benchmarks_to_run = args.functions
Expand Down
2 changes: 1 addition & 1 deletion docs/src/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ Check the index on the left for a more detailed description of any symbol.

| Symbols | Description |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| [`EventSet.simple_moving_average()`][temporian.EventSet.simple_moving_average] [`EventSet.moving_standard_deviation()`][temporian.EventSet.moving_standard_deviation] [`EventSet.cumsum()`][temporian.EventSet.cumsum] [`EventSet.moving_sum()`][temporian.EventSet.moving_sum] [`EventSet.moving_count()`][temporian.EventSet.moving_count] [`EventSet.moving_min()`][temporian.EventSet.moving_min] [`EventSet.moving_max()`][temporian.EventSet.moving_max] [`EventSet.cumprod()`][temporian.EventSet.cumprod] [`EventSet.moving_product()`][temporian.EventSet.moving_product] | Compute an operation on the values in a sliding window over an EventSet's timestamps. |
| [`EventSet.simple_moving_average()`][temporian.EventSet.simple_moving_average] [`EventSet.moving_standard_deviation()`][temporian.EventSet.moving_standard_deviation] [`EventSet.cumsum()`][temporian.EventSet.cumsum] [`EventSet.moving_sum()`][temporian.EventSet.moving_sum] [`EventSet.moving_count()`][temporian.EventSet.moving_count] [`EventSet.moving_min()`][temporian.EventSet.moving_min] [`EventSet.moving_max()`][temporian.EventSet.moving_max] [`EventSet.cumprod()`][temporian.EventSet.cumprod] [`EventSet.moving_product()`][temporian.EventSet.moving_product] [`EventSet.moving_quantile()`][temporian.EventSet.moving_quantile] | Compute an operation on the values in a sliding window over an EventSet's timestamps. |

### Python operators

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: temporian.EventSet.moving_quantile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome.
Can you also edit: docs/src/reference/index.md

Copy link
Collaborator Author

@javiber javiber Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*added to the list of windows operations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you create an entry in benchmark/benchmark_time.py to facilitate the benchmarking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*added it

73 changes: 73 additions & 0 deletions temporian/core/event_set_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -3218,6 +3218,79 @@ def moving_min(

return moving_min(self, window_length=window_length, sampling=sampling)

def moving_quantile(
self: EventSetOrNode,
window_length: WindowLength,
quantile: float,
sampling: Optional[EventSetOrNode] = None,
) -> EventSetOrNode:
"""Computes the quantile in a sliding window over an
[`EventSet`][temporian.EventSet].

For each t in sampling, and for each feature independently, returns at
time t the appropiated quantile for the feature in the window
(t - window_length, t].

`sampling` can't be specified if a variable `window_length` is
specified (i.e. if `window_length` is an EventSet).

If `sampling` is specified or `window_length` is an EventSet, the moving
window is sampled at each timestamp in them, else it is sampled on the
input's.

Missing values (such as NaNs) are ignored.

If the window does not contain any values (e.g., all the values are
missing, or the window does not contain any sampling), outputs missing
values.

The quantile calculated in each window is equivalent to numpy's
`"averaged_inverted_cdf"` method.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comments that the op only work on floating point features (or make some implicit conversion)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, I also support int as a valid input, however the output is converted to float

float -> float
double -> double
int -> float

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Added a comment on the doc explaining the point above


This operation only accepts numeric dtypes in the input.
For `float64` the output will be `float64` but for
`float32`, `int64`, and `int32` output will be `float32`.

Example:
```python
>>> a = tp.event_set(
... timestamps=[0, 1, 2, 5, 6, 7],
... features={"value": [np.nan, 1, 5, 10, 15, 20]},
... )

>>> a.moving_quantile(4, quantile=0.5)
indexes: ...
(6 events):
timestamps: [0. 1. 2. 5. 6. 7.]
'value': [ nan 1. 3. 7.5 12.5 15. ]
...

```

See [`EventSet.moving_count()`][temporian.EventSet.moving_count] for
examples of moving window operations with external sampling and indices.

Args:
window_length: Sliding window's length.
quantile: the desired quantile defined in the range (0, 1).
sampling: Timestamps to sample the sliding window's value at. If not
provided, timestamps in the input are used.

Returns:
EventSet containing the moving standard deviation of each feature in
the input.
"""
from temporian.core.operators.window.moving_quantile import (
moving_quantile,
)

return moving_quantile(
self,
window_length=window_length,
quantile=quantile,
sampling=sampling,
)

def moving_standard_deviation(
self: EventSetOrNode,
window_length: WindowLength,
Expand Down
15 changes: 15 additions & 0 deletions temporian/core/operators/window/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,18 @@ py_library(
"//temporian/core/data:schema",
],
)

py_library(
name = "moving_quantile",
srcs = ["moving_quantile.py"],
srcs_version = "PY3",
deps = [
":base",
"//temporian/core:compilation",
"//temporian/core:operator_lib",
"//temporian/core:typing",
"//temporian/core/data:dtype",
"//temporian/core/data:node",
"//temporian/core/data:schema",
],
)
1 change: 1 addition & 0 deletions temporian/core/operators/window/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@
from temporian.core.operators.window.moving_max import moving_max
from temporian.core.operators.window.moving_product import cumprod
from temporian.core.operators.window.moving_product import moving_product
from temporian.core.operators.window.moving_quantile import moving_quantile
15 changes: 14 additions & 1 deletion temporian/core/operators/window/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"""Base calendar operator class definition."""

from abc import ABC, abstractmethod
from typing import Optional
from typing import Any, List, Optional, Mapping
from temporian.core.data.duration_utils import normalize_duration


Expand Down Expand Up @@ -93,6 +93,7 @@ def __init__(
creator=self,
),
)
self.add_extra_attributes()

self.check()

Expand All @@ -119,8 +120,19 @@ def has_sampling(self) -> bool:
def has_variable_winlen(self) -> bool:
return self._has_variable_winlen

def add_extra_attributes(self):
pass

@classmethod
def extra_attribute_def(cls) -> List[Mapping[str, Any]]:
return []

@classmethod
def build_op_definition(cls) -> pb.OperatorDef:
extra_attr_def = [
pb.OperatorDef.Attribute(**attr)
for attr in cls.extra_attribute_def()
]
return pb.OperatorDef(
key=cls.operator_def_key(),
attributes=[
Expand All @@ -129,6 +141,7 @@ def build_op_definition(cls) -> pb.OperatorDef:
type=pb.OperatorDef.Attribute.Type.FLOAT_64,
is_optional=True,
),
*extra_attr_def,
],
inputs=[
pb.OperatorDef.Input(key="input"),
Expand Down
99 changes: 99 additions & 0 deletions temporian/core/operators/window/moving_quantile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Copyright 2021 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Moving count operator class and public API function definition."""

from typing import List, Mapping, Optional, Any

from temporian.core import operator_lib
from temporian.core.compilation import compile
from temporian.core.data.dtype import DType
from temporian.core.data.node import EventSetNode
from temporian.core.data.schema import FeatureSchema
from temporian.core.operators.window.base import BaseWindowOperator
from temporian.core.typing import EventSetOrNode, WindowLength
from temporian.proto import core_pb2 as pb


class MovingQuantileOperator(BaseWindowOperator):
def __init__(
self,
input: EventSetNode,
window_length: WindowLength,
quantile: float,
sampling: Optional[EventSetNode],
):
if quantile < 0 or quantile > 1:
raise ValueError(
"`quantile` must be a float between 0 and 1. "
f"Received {quantile}"
)
self._quantile = quantile
# This line should be at the top but `BaseWindowOperator.__init__` calls
# `self.check` which fails if `this._quantile` is not set
super().__init__(input, window_length, sampling)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to have it at the top? If so, can you add a comment. If not, I would move it at the top.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a reason but I had so many backs and forth with this one that I forgot why. Let me try to put at the top as well as change quantile to _quantile and see if anything breaks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work, the reason is that the init of the base class runs a self.check() at the end and this needs all the attributes to be defined including the quantile

javiber marked this conversation as resolved.
Show resolved Hide resolved

@property
def quantile(self) -> float:
return self._quantile

def add_extra_attributes(self):
self.add_attribute("quantile", self.quantile)

@classmethod
def operator_def_key(cls) -> str:
return "MOVING_QUANTILE"

def get_feature_dtype(self, feature: FeatureSchema) -> DType:
if not feature.dtype.is_numerical:
raise ValueError(
"moving_quantile requires the input EventSet to contain"
" numerical features only, but received feature"
f" {feature.name!r} with type {feature.dtype}"
)
if feature.dtype.is_integer:
return DType.FLOAT32
return feature.dtype

@classmethod
def extra_attribute_def(cls) -> List[Mapping[str, Any]]:
return [
{
"key": "quantile",
"is_optional": True,
"type": pb.OperatorDef.Attribute.Type.FLOAT_64,
}
]


operator_lib.register_operator(MovingQuantileOperator)


@compile
def moving_quantile(
input: EventSetOrNode,
window_length: WindowLength,
quantile: float,
sampling: Optional[EventSetOrNode] = None,
) -> EventSetOrNode:
assert isinstance(input, EventSetNode)
if sampling is not None:
assert isinstance(sampling, EventSetNode)

return MovingQuantileOperator(
input=input,
window_length=window_length,
quantile=quantile,
sampling=sampling,
).outputs["output"]
13 changes: 13 additions & 0 deletions temporian/core/operators/window/test/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,16 @@ py_test(
"//temporian/test:utils",
],
)

py_test(
name = "test_moving_quantile",
srcs = ["test_moving_quantile.py"],
srcs_version = "PY3",
deps = [
# already_there/absl/testing:absltest
# already_there/absl/testing:parameterized
"//temporian/implementation/numpy/data:io",
"//temporian/core/data:duration",
"//temporian/test:utils",
],
)
Loading
Loading