Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect historic average values #2920

Open
DennisLevl opened this issue Dec 11, 2024 · 9 comments
Open

Incorrect historic average values #2920

DennisLevl opened this issue Dec 11, 2024 · 9 comments

Comments

@DennisLevl
Copy link

DennisLevl commented Dec 11, 2024

Description

We have identified a bug when querying the historic _sum/EssActivePower for one of our simulated edges.
The query incorrectly returns the average of all existing entries for the requested resolution (e.g., 15 minutes) without considering that OpenEMS persists a value only when it changes (or at least every five minutes).

Sample
09:30:00: 5 kW
09:30:01: 10 kW
09:30:05: 5 kW
Querying the average between 09:30:00 and 09:30:05 yields:
(5+10+5)/3 = 6.67 kW.

However, the correct average should account for the persistence of values over time: (5+10+10+10+10+5)/6 = 8.33 kW.


Setup
Branch: Current dev-branch
Environment: OpenEMS backend with simulated edge
Database: InfluxDB v2.7.10


Explaination
The field queried is _sum/EssActivePower.

The issue is illustrated in the influx graph below:

  • EssActivePower remains constant near 0 kW until 09:30 am (point 1).
  • At 09:30 am, the power increases due to the PID filter, reaching 800,000 kW within 35 seconds (point 2).
  • After stabilizing at 800,000 kW, it gradually decreases back to near 0 kW by ~09:33 am (point 3), also using the PID filter.

image

During this time range, 74 entries exist in InfluxDB:

2 entries before the power increase, with a 5-minute gap between them.
35 entries while the power increases to 800,000 kW.
35 entries while the power decreases back to near 0 kW.
2 entries after the decrease, again with a 5-minute gap.

Requesting historic data via Postman for this range returns a mean value of 377,539 kW. This matches the result obtained using the mean function in InfluxDB.

image

Influx Query:

from(bucket: "openems")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data")
  |> filter(fn: (r) => r["_field"] == "_sum/EssActivePower")
  |> filter(fn: (r) => r["edge"] == "1")

Postman Query:

{
  "method": "edgeRpc",
  "params": {
      "edgeId": "edge1",
      "payload": {
          "method": "queryHistoricTimeseriesData",
          "params": {
              "timezone": 0,
              "fromDate": "2024-11-15",
              "toDate": "2024-11-15",
              "channels": [
                  "_sum/EssActivePower"
              ],
              "resolution": {"value": 15,"unit": "Minutes"}
          }
      }
  }
}

Expected behaviour

When accounting for gaps using the last existing value to fill missing timestamps, the correct average would be 121,724 kW.

image

Adjusted Influx Query:

from(bucket: "openems")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data")
  |> filter(fn: (r) => r["_field"] == "_sum/EssActivePower")
  |> filter(fn: (r) => r["edge"] == "1")
    |> aggregateWindow(every: 1s, fn: last, createEmpty: true)
  |> fill(usePrevious: true)
  |> aggregateWindow(every: 15m, fn: mean)

This approach correctly considers the persistence of values over time, resulting in the expected average.

Screenshots

No response

Operating System

No response

How to reproduce the Error?

  1. Setup an OpenEMS instance with Influx as time series database

  2. Produce sample data for the field _sum/EssActivePower with constant values
    Option 1: Due to a self-consumption optimization controller and a simulated production/consumption datasource with entries like 5, 10, 10, 10, 10, 5).
    Option 2: Use the "Fix Active Power" controller to produce a constant power for a short time. Note that this controller does not appear to use the PID filter.

  3. Query the configured time range with the history method queryHistoricTimeseriesData via postman. You could also check the amount of entries in Influx for this time range.

@Sn0w3y
Copy link
Contributor

Sn0w3y commented Dec 12, 2024

This issue isn't inherently a bug but rather a limitation of the current query logic and how the data is stored and interpreted. Here's why:

1. Expected Behavior Based on Storage Model

OpenEMS is designed to store time-series data efficiently by persisting values only when they change or periodically (e.g., every five minutes). This model reduces storage overhead and aligns with typical industrial monitoring requirements. The query logic reflects this design, so when calculating the average, only the recorded points are considered, without inferring the duration a value persisted.

  • Implication: The current behavior correctly reflects the raw data as stored. However, it does not account for the persistence of values between stored entries.

2. Gap Filling Is a Value-Added Operation

Filling gaps with the last known value is not a universal requirement and depends on the specific analysis context. For example:

  • Some applications prioritize raw, unmodified data.
  • Others, like energy calculations, might need gap filling to infer the persistence of values.

The query as implemented prioritizes simplicity and raw data fidelity, which is valid for many use cases.

3. InfluxDB Default Behavior

InfluxDB’s mean function does not automatically account for the duration of values; it calculates the mean of stored points. This is standard behavior for most time-series databases and does not imply a bug in OpenEMS.

  • Alternative Approaches: Users needing time-weighted averages should explicitly configure queries to account for value persistence using tools like fill and granular aggregation.

4. Trade-Offs of Gap Filling

Gap filling introduces complexity and computational overhead, especially with large datasets or high-frequency data. Not all users or scenarios require this level of precision, so implementing it as a default could:

  • Reduce performance.
  • Lead to unexpected results in contexts where gap filling is unnecessary or undesired.

5. Documentation and User Education

The current behavior is consistent with OpenEMS's data storage and querying principles. To address the confusion:

  • The documentation could explain that averages are calculated from stored points only.
  • Guidance on performing time-weighted averages with gap filling (via InfluxDB or application-level adjustments) can be provided for users with specific needs.

Why This Matters

  • Not a Bug: The current implementation works as designed, focusing on raw data accuracy and storage efficiency.
  • Feature Request: If time-weighted averages are frequently needed, adding an optional fillGaps feature could enhance usability without disrupting existing behavior.

In the getParams you could add:

            .addProperty("fillGaps", true);

to it, so it is correct for you maybe?

@DennisLevl
Copy link
Author

I see, thank you for your explanation. I labelled it as a bug because in multiple cases we need time-weighted averages in OpenEMS to check the history (e.g. the energy monitor in the UI). But I also understand your arguments.
Your solution to add a property as part of the params sounds great. From my point of view this would be a great feature to increase the accuracy of the data.

@Sn0w3y
Copy link
Contributor

Sn0w3y commented Dec 12, 2024

Just because I am curious - what happens, if the Data is empty for 2 days and you use fill() for a timeframe which is within this missing data ?

@DennisLevl
Copy link
Author

You're right. In this case, the gaps would still be filled with the previous value, which is also not a perfect solution.

Let me give you some background information on why we encountered this issue. In early November, we transitioned from release 2024.9.0 to the current dev version (as of November 6, 2024). Prior to this update, values like _sum/EssActivePower, _sum/GridActivePower, and _sum/EssSoc were stored in Influx every second, regardless of changes.

image

With the update, however, this behavior changed, and data is now only stored when there are changes, or at least every 5 minutes.

image

As we need to analyze the power usage of our batteries at various resolutions for accurate forecasting, it is essential to have technically precise data.

Ultimately, the solution with fill() was just a workaround for the underlying issue of missing data points.

Unfortunately, I haven't yet determined what caused this change. Since time-series databases like Influx are optimized for storing large amounts of data and use various compression algorithms, the advantage of consistent data for _sum-channels should outweigh the additional data storage.

From my perspective, at least for the past month, the data should be available in the corresponding resolution. There are a lot of use cases where high-resolution data plays an important role.

@Sn0w3y
Copy link
Contributor

Sn0w3y commented Dec 16, 2024

From my perspective, at least for the past month, the data should be available in the corresponding resolution. There are a lot of use cases where high-resolution data plays an important role.

That is true, but in a production environment, it's not feasible to have second-by-second data available—especially with more than 10,000 devices. The scalability and storage requirements make this impractical. Perhaps we could consider alternative approaches like adaptive sampling rates or data aggregation to ensure the needed accuracy without impacting system performance.

@DennisLevl
Copy link
Author

Thank you for your message.
As can be seen in my first message, we are using the ‘edgeRpc’ method to query the historicTimeseriesData. As I understand it, this should forward the request directly to the edge: Internal Component Communication :: Open Energy Management System

I have set up an Influx-DB on the edges, which shows that the data is saved with a resolution of every second.
Information is only lost during synchronisation with the backend.

However, I still get the data from the backend. If you look at https://github.com/OpenEMS/openems/blob/develop/io.openems.backend.core/src/io/openems/backend/core/jsonrpcrequesthandler/EdgeRpcRequestHandler.java lines 77 and 126, you can see that an edgeRpc call first attempts to load the data from the backend and only loads it from the edge if this is not possible.
My expectation would have been the other way round?

So I have the following questions and maybe someone can help me answer one of them:

  • Is this prioritisation (backend, then edge) intentional? If so, why? It seems unintuitive to me because of the naming. Also, the data in the edge is available in a higher resolution. It also places an unnecessary burden on the backend if it has to deal with the request itself instead of passing it on. Especially with a high number of devices.
  • Let's assume I get the data from the Edge. Is there a regulation on how long the data is retained by the edge?
  • In which commit was the resolution adjusted? I used to be able to retrieve the data from the backend in a higher resolution (we didn't even have a DB defined for the edges before)

Ultimately, I need some solution to retrieve the data in the correct resolution. Otherwise we can't calculate our predictions reliably.

@Sn0w3y
Copy link
Contributor

Sn0w3y commented Dec 19, 2024

Well then i guess you need to develop some new Lines of Code to achieve this :)

More in Detail maybe later on.

Greetings

@Sn0w3y
Copy link
Contributor

Sn0w3y commented Dec 20, 2024

  1. Is this prioritisation (backend, then edge) intentional? If so, why? It seems unintuitive to me because of the naming. Also, the data in the edge is available in a higher resolution. It also places an unnecessary burden on the backend if it has to deal with the request itself instead of passing it on. Especially with a high number of devices.

    • IMHO: Yes, this prioritization is intentional. The backend is designed to be the central point of data aggregation and storage, ensuring that all data is stored reliably and consistently. Since the backend is always online and can handle large volumes of requests, it makes sense to fetch data from there first. The edge can sometimes be offline or unavailable, which is why it is the fallback in case the backend cannot fulfill the request. Additionally, the backend provides a centralized point for data management, which is crucial for large-scale deployments, ensuring that everything is synchronized properly.

    • In terms of scalability, the backend should be able to handle these requests even with a large number of devices, especially if the backend is configured properly. If FENECON can handle it, so can your system, provided it is well-optimized.

  2. Let's assume I get the data from the Edge. Is there a regulation on how long the data is retained by the edge?

    • The data is retained as long as the edge is running. Since the edge stores the data locally in its database, the retention is tied to the uptime of the edge. If the edge device is running, the data will persist. If the device is restarted or shuts down, there may be scenarios where some data could be lost unless it is synchronized with the backend, depending on the system's configuration.
  3. In which commit was the resolution adjusted? I used to be able to retrieve the data from the backend in a higher resolution (we didn't even have a DB defined for the edges before).

But with a little bit of search you could have found it ;)

@Sn0w3y
Copy link
Contributor

Sn0w3y commented Dec 20, 2024

If, in the future, a comprehensive solution addressing all of these issues is identified, we can certainly consider creating a Pull Request to implement it. For the time being, however, there doesn’t appear to be any fundamental defect or urgent need for changes within the OpenEMS UI, the Edge, or the Backend as currently implemented. Therefore, this “non-issue” can be closed for now. Should circumstances change or additional requirements arise, we can revisit this matter and proceed with a targeted PR at that time. @sfeilmeier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants