[processor/lsminterval] Behaviour of timestamp for aggregated metrics #158

lahsivjar · 2024-10-03T13:20:38Z

LSM interval creates aggregated metrics without utilizing the timestamp of the incoming events. The timestamp of the aggregated event is set to the latest timestamp of the aggregated datapoints. This behaviour is based on the upstream intervalprocessor on which the lsmintervalprocessor is based on.

OTOH, apm-aggregation uses truncated event timestamp (based on aggregation interval) for bucketing data. Due to this, old metrics are grouped in a separate bucket. This would mean that, aggregations for old spans would have a new timestamp in OTel.

felixbarny · 2024-10-04T08:30:09Z

Maybe this is something that should be configurable (event timestamp vs arrival timestamp).

If we truncate the event timestamp, we'll need to add another dimension for late arrivals. Otherwise, we could create another data point for the same timestamp for the same time series, leading to duplicate rejections.

This was one of the reasons that made it hard for use to adopt TSDB for APM.

lahsivjar · 2024-10-04T08:55:05Z

If we truncate the event timestamp, we'll need to add another dimension for late arrivals. Otherwise, we could create another data point for the same timestamp for the same time series, leading to duplicate rejections.

Good point! It seems duplicate rejection would always be the case with APM logic for late arrivals which didn't make their realtime aggregation bucket. The current lsmintervalprocessor doesn't consider timestamp as an aggregation dimension but if we want to do it, we would have to truncate the event timestamp making data rejection a lot more probable. We could add another dimension, like a current truncated aggregation processing window which should make it unique but this would mean a new timeseries for every aggregation period... not sure if this is a good idea.

Alternatively, if we decide to leave things as they are i.e. aggregating with arrival timestamp then the resulting data might be a bit skewed. We could decide to only aggregate late arrivals up to a limit and minimize the deviation of the skew but not sure if that is any better.

felixbarny · 2024-10-04T09:11:23Z

What I'm thinking of is to add a numeric dimension (for example named offset) that indicates how "late" the data is. For example, if we aggregate spans for the current time, the offset would be 0. For the 1m bucket, if we aggregate a span from the past minute, the offset is 1, and so on. So we "only" create a new time series for actual late arriving data.

While that would work well with delta temporality, I don't think it would really work with cumulative temporality.

lahsivjar · 2024-10-04T21:09:11Z

Hmm, that's a neat trick which gives me another idea. We could use the concept of offset but instead of encoding it as a separate attribute, we could encode it in the timestamp. If we were to emit event timestamp for aggregated metrics, we would truncate the UNIX timestamp as per a given interval. This would give us exact points in timestamp for a 0 offset aggregated metric. For any late arrivals, we can calculate its offset and add that many milliseconds to the timestamp. Even if an aggregation interval is 1 minute in length, we would be able to accommodate up to 1000 hours worth of late arrivals.

felixbarny · 2024-10-07T10:33:04Z

That won't allow us to accept timestamps that are arbitrarily late. But TSDB already has limitations on late-arriving data, so that's probably fine. Also, if we use nanosecond precision timestamps, it will be even less of an issue.

Maybe we should have a configurable set of strategies as the ideal strategy probably depends on the capabilities of the backend (nanosecond support, support for late arrivals, etc.)

lahsivjar added the processor/lsminterval label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/lsminterval] Behaviour of timestamp for aggregated metrics #158

[processor/lsminterval] Behaviour of timestamp for aggregated metrics #158

lahsivjar commented Oct 3, 2024 •

edited

Loading

felixbarny commented Oct 4, 2024

lahsivjar commented Oct 4, 2024

felixbarny commented Oct 4, 2024

lahsivjar commented Oct 4, 2024 •

edited

Loading

felixbarny commented Oct 7, 2024

[processor/lsminterval] Behaviour of timestamp for aggregated metrics #158

[processor/lsminterval] Behaviour of timestamp for aggregated metrics #158

Comments

lahsivjar commented Oct 3, 2024 • edited Loading

felixbarny commented Oct 4, 2024

lahsivjar commented Oct 4, 2024

felixbarny commented Oct 4, 2024

lahsivjar commented Oct 4, 2024 • edited Loading

felixbarny commented Oct 7, 2024

lahsivjar commented Oct 3, 2024 •

edited

Loading

lahsivjar commented Oct 4, 2024 •

edited

Loading