Incorrect historic average values #2920

DennisLevl · 2024-12-11T14:29:18Z

Description

We have identified a bug when querying the historic _sum/EssActivePower for one of our simulated edges.
The query incorrectly returns the average of all existing entries for the requested resolution (e.g., 15 minutes) without considering that OpenEMS persists a value only when it changes (or at least every five minutes).

Sample
09:30:00: 5 kW
09:30:01: 10 kW
09:30:05: 5 kW
Querying the average between 09:30:00 and 09:30:05 yields:
(5+10+5)/3 = 6.67 kW.

However, the correct average should account for the persistence of values over time: (5+10+10+10+10+5)/6 = 8.33 kW.

Setup
Branch: Current dev-branch
Environment: OpenEMS backend with simulated edge
Database: InfluxDB v2.7.10

Explaination
The field queried is _sum/EssActivePower.

The issue is illustrated in the influx graph below:

EssActivePower remains constant near 0 kW until 09:30 am (point 1).
At 09:30 am, the power increases due to the PID filter, reaching 800,000 kW within 35 seconds (point 2).
After stabilizing at 800,000 kW, it gradually decreases back to near 0 kW by ~09:33 am (point 3), also using the PID filter.

During this time range, 74 entries exist in InfluxDB:

2 entries before the power increase, with a 5-minute gap between them.
35 entries while the power increases to 800,000 kW.
35 entries while the power decreases back to near 0 kW.
2 entries after the decrease, again with a 5-minute gap.

Requesting historic data via Postman for this range returns a mean value of 377,539 kW. This matches the result obtained using the mean function in InfluxDB.

Influx Query:

from(bucket: "openems")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data")
  |> filter(fn: (r) => r["_field"] == "_sum/EssActivePower")
  |> filter(fn: (r) => r["edge"] == "1")

Postman Query:

{
  "method": "edgeRpc",
  "params": {
      "edgeId": "edge1",
      "payload": {
          "method": "queryHistoricTimeseriesData",
          "params": {
              "timezone": 0,
              "fromDate": "2024-11-15",
              "toDate": "2024-11-15",
              "channels": [
                  "_sum/EssActivePower"
              ],
              "resolution": {"value": 15,"unit": "Minutes"}
          }
      }
  }
}

Expected behaviour

When accounting for gaps using the last existing value to fill missing timestamps, the correct average would be 121,724 kW.

Adjusted Influx Query:

from(bucket: "openems")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data")
  |> filter(fn: (r) => r["_field"] == "_sum/EssActivePower")
  |> filter(fn: (r) => r["edge"] == "1")
    |> aggregateWindow(every: 1s, fn: last, createEmpty: true)
  |> fill(usePrevious: true)
  |> aggregateWindow(every: 15m, fn: mean)

This approach correctly considers the persistence of values over time, resulting in the expected average.

Screenshots

No response

Operating System

No response

How to reproduce the Error?

Setup an OpenEMS instance with Influx as time series database
Produce sample data for the field _sum/EssActivePower with constant values
Option 1: Due to a self-consumption optimization controller and a simulated production/consumption datasource with entries like 5, 10, 10, 10, 10, 5).
Option 2: Use the "Fix Active Power" controller to produce a constant power for a short time. Note that this controller does not appear to use the PID filter.
Query the configured time range with the history method queryHistoricTimeseriesData via postman. You could also check the amount of entries in Influx for this time range.

The text was updated successfully, but these errors were encountered:

Sn0w3y · 2024-12-12T05:51:15Z

This issue isn't inherently a bug but rather a limitation of the current query logic and how the data is stored and interpreted. Here's why:

1. Expected Behavior Based on Storage Model

OpenEMS is designed to store time-series data efficiently by persisting values only when they change or periodically (e.g., every five minutes). This model reduces storage overhead and aligns with typical industrial monitoring requirements. The query logic reflects this design, so when calculating the average, only the recorded points are considered, without inferring the duration a value persisted.

Implication: The current behavior correctly reflects the raw data as stored. However, it does not account for the persistence of values between stored entries.

2. Gap Filling Is a Value-Added Operation

Filling gaps with the last known value is not a universal requirement and depends on the specific analysis context. For example:

Some applications prioritize raw, unmodified data.
Others, like energy calculations, might need gap filling to infer the persistence of values.

The query as implemented prioritizes simplicity and raw data fidelity, which is valid for many use cases.

3. InfluxDB Default Behavior

InfluxDB’s mean function does not automatically account for the duration of values; it calculates the mean of stored points. This is standard behavior for most time-series databases and does not imply a bug in OpenEMS.

Alternative Approaches: Users needing time-weighted averages should explicitly configure queries to account for value persistence using tools like fill and granular aggregation.

4. Trade-Offs of Gap Filling

Gap filling introduces complexity and computational overhead, especially with large datasets or high-frequency data. Not all users or scenarios require this level of precision, so implementing it as a default could:

Reduce performance.
Lead to unexpected results in contexts where gap filling is unnecessary or undesired.

5. Documentation and User Education

The current behavior is consistent with OpenEMS's data storage and querying principles. To address the confusion:

The documentation could explain that averages are calculated from stored points only.
Guidance on performing time-weighted averages with gap filling (via InfluxDB or application-level adjustments) can be provided for users with specific needs.

Why This Matters

Not a Bug: The current implementation works as designed, focusing on raw data accuracy and storage efficiency.
Feature Request: If time-weighted averages are frequently needed, adding an optional fillGaps feature could enhance usability without disrupting existing behavior.

In the getParams you could add:

            .addProperty("fillGaps", true);

to it, so it is correct for you maybe?

DennisLevl · 2024-12-12T12:28:26Z

I see, thank you for your explanation. I labelled it as a bug because in multiple cases we need time-weighted averages in OpenEMS to check the history (e.g. the energy monitor in the UI). But I also understand your arguments.
Your solution to add a property as part of the params sounds great. From my point of view this would be a great feature to increase the accuracy of the data.

Sn0w3y · 2024-12-12T12:35:16Z

Just because I am curious - what happens, if the Data is empty for 2 days and you use fill() for a timeframe which is within this missing data ?

DennisLevl · 2024-12-13T12:05:20Z

You're right. In this case, the gaps would still be filled with the previous value, which is also not a perfect solution.

Let me give you some background information on why we encountered this issue. In early November, we transitioned from release 2024.9.0 to the current dev version (as of November 6, 2024). Prior to this update, values like _sum/EssActivePower, _sum/GridActivePower, and _sum/EssSoc were stored in Influx every second, regardless of changes.

With the update, however, this behavior changed, and data is now only stored when there are changes, or at least every 5 minutes.

As we need to analyze the power usage of our batteries at various resolutions for accurate forecasting, it is essential to have technically precise data.

Ultimately, the solution with fill() was just a workaround for the underlying issue of missing data points.

Unfortunately, I haven't yet determined what caused this change. Since time-series databases like Influx are optimized for storing large amounts of data and use various compression algorithms, the advantage of consistent data for _sum-channels should outweigh the additional data storage.

From my perspective, at least for the past month, the data should be available in the corresponding resolution. There are a lot of use cases where high-resolution data plays an important role.

Sn0w3y · 2024-12-16T15:50:49Z

From my perspective, at least for the past month, the data should be available in the corresponding resolution. There are a lot of use cases where high-resolution data plays an important role.

That is true, but in a production environment, it's not feasible to have second-by-second data available—especially with more than 10,000 devices. The scalability and storage requirements make this impractical. Perhaps we could consider alternative approaches like adaptive sampling rates or data aggregation to ensure the needed accuracy without impacting system performance.

DennisLevl · 2024-12-19T07:14:49Z

Thank you for your message.
As can be seen in my first message, we are using the ‘edgeRpc’ method to query the historicTimeseriesData. As I understand it, this should forward the request directly to the edge: Internal Component Communication :: Open Energy Management System

I have set up an Influx-DB on the edges, which shows that the data is saved with a resolution of every second.
Information is only lost during synchronisation with the backend.

However, I still get the data from the backend. If you look at https://github.com/OpenEMS/openems/blob/develop/io.openems.backend.core/src/io/openems/backend/core/jsonrpcrequesthandler/EdgeRpcRequestHandler.java lines 77 and 126, you can see that an edgeRpc call first attempts to load the data from the backend and only loads it from the edge if this is not possible.
My expectation would have been the other way round?

So I have the following questions and maybe someone can help me answer one of them:

Is this prioritisation (backend, then edge) intentional? If so, why? It seems unintuitive to me because of the naming. Also, the data in the edge is available in a higher resolution. It also places an unnecessary burden on the backend if it has to deal with the request itself instead of passing it on. Especially with a high number of devices.
Let's assume I get the data from the Edge. Is there a regulation on how long the data is retained by the edge?
In which commit was the resolution adjusted? I used to be able to retrieve the data from the backend in a higher resolution (we didn't even have a DB defined for the edges before)

Ultimately, I need some solution to retrieve the data in the correct resolution. Otherwise we can't calculate our predictions reliably.

Sn0w3y · 2024-12-19T08:11:57Z

Well then i guess you need to develop some new Lines of Code to achieve this :)

More in Detail maybe later on.

Greetings

Sn0w3y · 2024-12-20T09:54:27Z

Is this prioritisation (backend, then edge) intentional? If so, why? It seems unintuitive to me because of the naming. Also, the data in the edge is available in a higher resolution. It also places an unnecessary burden on the backend if it has to deal with the request itself instead of passing it on. Especially with a high number of devices.
- IMHO: Yes, this prioritization is intentional. The backend is designed to be the central point of data aggregation and storage, ensuring that all data is stored reliably and consistently. Since the backend is always online and can handle large volumes of requests, it makes sense to fetch data from there first. The edge can sometimes be offline or unavailable, which is why it is the fallback in case the backend cannot fulfill the request. Additionally, the backend provides a centralized point for data management, which is crucial for large-scale deployments, ensuring that everything is synchronized properly.
- In terms of scalability, the backend should be able to handle these requests even with a large number of devices, especially if the backend is configured properly. If FENECON can handle it, so can your system, provided it is well-optimized.
Let's assume I get the data from the Edge. Is there a regulation on how long the data is retained by the edge?
- The data is retained as long as the edge is running. Since the edge stores the data locally in its database, the retention is tied to the uptime of the edge. If the edge device is running, the data will persist. If the device is restarted or shuts down, there may be scenarios where some data could be lost unless it is synchronized with the backend, depending on the system's configuration.
In which commit was the resolution adjusted? I used to be able to retrieve the data from the backend in a higher resolution (we didn't even have a DB defined for the edges before).
- The resolution adjustment is likely found in this commit:
  [OpenEMS commit 6f227ca245a80185bf2033b79d390e55f3362c92](6f227ca#diff-d9cec3711b241557f4d36c329816c2d3e41143e5584cf7227c83626282b50e1fL7).

But with a little bit of search you could have found it ;)

Sn0w3y · 2024-12-20T13:26:07Z

If, in the future, a comprehensive solution addressing all of these issues is identified, we can certainly consider creating a Pull Request to implement it. For the time being, however, there doesn’t appear to be any fundamental defect or urgent need for changes within the OpenEMS UI, the Edge, or the Backend as currently implemented. Therefore, this “non-issue” can be closed for now. Should circumstances change or additional requirements arise, we can revisit this matter and proceed with a targeted PR at that time. @sfeilmeier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect historic average values #2920

Incorrect historic average values #2920

DennisLevl commented Dec 11, 2024 •

edited

Loading

Sn0w3y commented Dec 12, 2024 •

edited

Loading

DennisLevl commented Dec 12, 2024

Sn0w3y commented Dec 12, 2024

DennisLevl commented Dec 13, 2024

Sn0w3y commented Dec 16, 2024

DennisLevl commented Dec 19, 2024

Sn0w3y commented Dec 19, 2024

Sn0w3y commented Dec 20, 2024 •

edited

Loading

Sn0w3y commented Dec 20, 2024

Incorrect historic average values #2920

Incorrect historic average values #2920

Comments

DennisLevl commented Dec 11, 2024 • edited Loading

Description

Screenshots

Operating System

How to reproduce the Error?

Sn0w3y commented Dec 12, 2024 • edited Loading

1. Expected Behavior Based on Storage Model

2. Gap Filling Is a Value-Added Operation

3. InfluxDB Default Behavior

4. Trade-Offs of Gap Filling

5. Documentation and User Education

Why This Matters

DennisLevl commented Dec 12, 2024

Sn0w3y commented Dec 12, 2024

DennisLevl commented Dec 13, 2024

Sn0w3y commented Dec 16, 2024

DennisLevl commented Dec 19, 2024

Sn0w3y commented Dec 19, 2024

Sn0w3y commented Dec 20, 2024 • edited Loading

Sn0w3y commented Dec 20, 2024

DennisLevl commented Dec 11, 2024 •

edited

Loading

Sn0w3y commented Dec 12, 2024 •

edited

Loading

Sn0w3y commented Dec 20, 2024 •

edited

Loading