Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struggling with unmanaged memory #361

Open
EmanueleAlbero opened this issue Aug 27, 2024 · 10 comments
Open

Struggling with unmanaged memory #361

EmanueleAlbero opened this issue Aug 27, 2024 · 10 comments
Labels

Comments

@EmanueleAlbero
Copy link

EmanueleAlbero commented Aug 27, 2024

Description

Hi, I'd start saying that I don't know if this is an actual issue or just a misconfiguration problem.
However I've a topology with 2 KStreams and an Outer Join between these 2 KStreams.
KStream1 receive a data every 100ms
KStream2 receive a data every about 3/4s
Uses a RocksDb store with default settings.
Everything works fine but I can see the unmanaged memory keep growing indefinitely.

Screenshot 2024-08-07 125338

This is a picture from dotmemory of the application after several hours of work (on the very same partition).
To add more context I'm also applying a 20 min Grace period, 10 min Retention Time and 10 min for WindowStoreChangelogAdditionalRetentionMs

Run the application both on Linux or Windows environment shows a similar behavior

Is there something I can check\verify in the configuration to avoid this issue?

@LGouellec
Copy link
Owner

Hey @EmanueleAlbero,

By default Streamiz and (Kafka Streams JAVA also) use one Rocksdb instance per store per partition, for Windowed store it's at least 3 Rocksdb instance per store per partition.

For each RocksDb instance, we have :

  • Read cache = 50 Mb (unmanaged memory for read operations)
  • Write cache = 48Mb (unmanaged memory for write operations)
    • Index and Filter = This is the tricky point here, Index and Filter are not included on the Read cache, so basically this amount of memory is not bounded.

So more partitions you have, more unmanaged memory you need, especially if you have stream-stream join operations.

In JAVA, you can configure a RocksDb config setter to override the default behavior :
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb

In Streamiz, you can do more or less the same thing, excepting setting the index and filter in the block cache to avoid lot of unmanaged memory consumption. It could be a good enhancement to bound the unmanaged memory.

Let me fill the gap in the next release.

@EmanueleAlbero
Copy link
Author

Hey @LGouellec thanks, very informative!
Can I also ask you some more information about what I've experienced?

  1. I saw that even if the retention kicks in or there is no data on the Kstreams for long time (hours) the memory doesn't get freed (or, when it does the freed memory is marginal).
    Here is an example from my last run
    image

  2. I see the unmanaged memory growing even when there is nothing on the feed (in a smaller way than when there are the joins but still...). In this case most of it is due to the metrics which by the way also had some issue: If I check my app /metrics endpoint I'm can only see the Streamiz' metrics (where it also expose the topology) for few minutes then it stop exposing them.
    I don't have a picture for this but in my last test I ran the application on an isolated environment with no data on the feed; It started using 300MB and after 5 days the memory was at 1.8GB where mostly was unmanaged memory (~1.5GB).

@LGouellec
Copy link
Owner

Hey @EmanueleAlbero ,

1- RocksDb use index and filter to rapidly get data. These index and filters are stored in the memory, but let me try to reproduce to avoid memory leak.

2- Which Metrics package do you use ? Streamiz.Kafka.Net.Metrics.Prometheus or Streamiz.Kafka.Net.Metrics.OpenTelemetry?

Btw, I'm currently conducting a satisfaction survey to understand how I can better serve and I would love to get your feedback on the product.
Your insights are invaluable and will help us shape the future of our product to better meet your needs. The survey will only take a few minutes, and your responses will be completely confidential.

Survey

Thank you for your time and feedback!
Best regards,

@EmanueleAlbero
Copy link
Author

Hy @LGouellec
I'm using OpenTelemetry.

here is the result of a test disabling the metrics at all
image

I've participated in the survey and I want to thank you once again for the amazing job you are doing.
Let me know if I can be any more helpful.

@LGouellec
Copy link
Owner

Hey @EmanueleAlbero ,

So it seems that the OpenTelemetry exporter has a memory leak. I'll fix it.
Can you test the Prometheus exporter and tell me if the problem is still there or not ?

@hedmavx
Copy link

hedmavx commented Sep 26, 2024

Hi @LGouellec we are experiencing a similar memory leak when using the prometheus exporter instead of OpenTelemetry. We are using streamiz 1.6

@LGouellec
Copy link
Owner

Hey @hedmavx ,

You mean if you disable the prometheus exporter, you have no longer memory leak ?

@LGouellec LGouellec added this to the 1.7.0 milestone Oct 1, 2024
@LGouellec
Copy link
Owner

@hedmavx ,

Can you reproduce the memory leak with the prometheus exporter and provide a thread dump please ?

Best regards,

@LGouellec
Copy link
Owner

@EmanueleAlbero

I have found the memory leak of the OpenTelemetry reporter. I'll try to fix asap.

Best regards,

@hedmavx
Copy link

hedmavx commented Nov 12, 2024

@hedmavx ,

Can you reproduce the memory leak with the prometheus exporter and provide a thread dump please ?

Best regards,

Hi sadly we can't get a thread dump in the environment we are running the application

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants