Struggling with unmanaged memory #361

EmanueleAlbero · 2024-08-27T12:19:31Z

Description

Hi, I'd start saying that I don't know if this is an actual issue or just a misconfiguration problem.
However I've a topology with 2 KStreams and an Outer Join between these 2 KStreams.
KStream1 receive a data every 100ms
KStream2 receive a data every about 3/4s
Uses a RocksDb store with default settings.
Everything works fine but I can see the unmanaged memory keep growing indefinitely.

This is a picture from dotmemory of the application after several hours of work (on the very same partition).
To add more context I'm also applying a 20 min Grace period, 10 min Retention Time and 10 min for WindowStoreChangelogAdditionalRetentionMs

Run the application both on Linux or Windows environment shows a similar behavior

Is there something I can check\verify in the configuration to avoid this issue?

The text was updated successfully, but these errors were encountered:

LGouellec · 2024-08-28T23:18:43Z

Hey @EmanueleAlbero,

By default Streamiz and (Kafka Streams JAVA also) use one Rocksdb instance per store per partition, for Windowed store it's at least 3 Rocksdb instance per store per partition.

For each RocksDb instance, we have :

Read cache = 50 Mb (unmanaged memory for read operations)
Write cache = 48Mb (unmanaged memory for write operations)
- Index and Filter = This is the tricky point here, Index and Filter are not included on the Read cache, so basically this amount of memory is not bounded.

So more partitions you have, more unmanaged memory you need, especially if you have stream-stream join operations.

In JAVA, you can configure a RocksDb config setter to override the default behavior :
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb

In Streamiz, you can do more or less the same thing, excepting setting the index and filter in the block cache to avoid lot of unmanaged memory consumption. It could be a good enhancement to bound the unmanaged memory.

Let me fill the gap in the next release.

EmanueleAlbero · 2024-08-29T05:50:43Z

Hey @LGouellec thanks, very informative!
Can I also ask you some more information about what I've experienced?

I saw that even if the retention kicks in or there is no data on the Kstreams for long time (hours) the memory doesn't get freed (or, when it does the freed memory is marginal).
Here is an example from my last run
I see the unmanaged memory growing even when there is nothing on the feed (in a smaller way than when there are the joins but still...). In this case most of it is due to the metrics which by the way also had some issue: If I check my app /metrics endpoint I'm can only see the Streamiz' metrics (where it also expose the topology) for few minutes then it stop exposing them.
I don't have a picture for this but in my last test I ran the application on an isolated environment with no data on the feed; It started using 300MB and after 5 days the memory was at 1.8GB where mostly was unmanaged memory (~1.5GB).

LGouellec · 2024-08-30T17:43:56Z

Hey @EmanueleAlbero ,

1- RocksDb use index and filter to rapidly get data. These index and filters are stored in the memory, but let me try to reproduce to avoid memory leak.

2- Which Metrics package do you use ? Streamiz.Kafka.Net.Metrics.Prometheus or Streamiz.Kafka.Net.Metrics.OpenTelemetry?

Btw, I'm currently conducting a satisfaction survey to understand how I can better serve and I would love to get your feedback on the product.
Your insights are invaluable and will help us shape the future of our product to better meet your needs. The survey will only take a few minutes, and your responses will be completely confidential.

Survey

Thank you for your time and feedback!
Best regards,

EmanueleAlbero · 2024-08-30T20:45:07Z

Hy @LGouellec
I'm using OpenTelemetry.

here is the result of a test disabling the metrics at all

I've participated in the survey and I want to thank you once again for the amazing job you are doing.
Let me know if I can be any more helpful.

LGouellec · 2024-09-02T22:13:58Z

Hey @EmanueleAlbero ,

So it seems that the OpenTelemetry exporter has a memory leak. I'll fix it.
Can you test the Prometheus exporter and tell me if the problem is still there or not ?

hedmavx · 2024-09-26T16:12:05Z

Hi @LGouellec we are experiencing a similar memory leak when using the prometheus exporter instead of OpenTelemetry. We are using streamiz 1.6

LGouellec · 2024-09-26T16:49:25Z

Hey @hedmavx ,

You mean if you disable the prometheus exporter, you have no longer memory leak ?

LGouellec · 2024-10-22T17:42:40Z

@hedmavx ,

Can you reproduce the memory leak with the prometheus exporter and provide a thread dump please ?

Best regards,

LGouellec · 2024-10-22T17:43:19Z

@EmanueleAlbero

I have found the memory leak of the OpenTelemetry reporter. I'll try to fix asap.

Best regards,

hedmavx · 2024-11-12T08:19:05Z

@hedmavx ,

Can you reproduce the memory leak with the prometheus exporter and provide a thread dump please ?

Best regards,

Hi sadly we can't get a thread dump in the environment we are running the application

Best regards

LGouellec added this to the 1.7.0 milestone Oct 1, 2024

LGouellec mentioned this issue Oct 23, 2024

[bug] - Memory leak open-telemetry/opentelemetry-dotnet#5922

Open

LGouellec removed this from the 1.7.0 milestone Oct 29, 2024

LGouellec added the blocked label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Struggling with unmanaged memory #361

Struggling with unmanaged memory #361

EmanueleAlbero commented Aug 27, 2024 •

edited

Loading

LGouellec commented Aug 28, 2024

EmanueleAlbero commented Aug 29, 2024

LGouellec commented Aug 30, 2024

EmanueleAlbero commented Aug 30, 2024

LGouellec commented Sep 2, 2024

hedmavx commented Sep 26, 2024

LGouellec commented Sep 26, 2024

LGouellec commented Oct 22, 2024

LGouellec commented Oct 22, 2024

hedmavx commented Nov 12, 2024

Struggling with unmanaged memory #361

Struggling with unmanaged memory #361

Comments

EmanueleAlbero commented Aug 27, 2024 • edited Loading

Description

LGouellec commented Aug 28, 2024

EmanueleAlbero commented Aug 29, 2024

LGouellec commented Aug 30, 2024

EmanueleAlbero commented Aug 30, 2024

LGouellec commented Sep 2, 2024

hedmavx commented Sep 26, 2024

LGouellec commented Sep 26, 2024

LGouellec commented Oct 22, 2024

LGouellec commented Oct 22, 2024

hedmavx commented Nov 12, 2024

EmanueleAlbero commented Aug 27, 2024 •

edited

Loading