-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Otel Java Agent Causing Heap Memory Leak Issue #12303
Comments
안녕하세요. 저 역시도 유사한 문제를 겪었고 이를 트러블슈팅하고 있는 사용자입니다. 제 해결 경험이 도움이 되고자 하여 이를 정리하였습니다. 상황저의 경우는 Java Agent로 자동 계측되는 Span 외에도 Agent의 Extension을 통해 별도의 Span과 Attributes를 수집하였습니다. 원인제가 추정한 원인과 해결은 다음과 같습니다. (문제가 해결되었으나, 정확하지는 않으므로 당신의 판단이 필요합니다.)
당신의 빠른 문제 해결을 기원합니다. Hello, I have experienced a similar issue and have been troubleshooting it myself. I’ve compiled my resolution experience in the hope that it may be helpful. Like you, I encountered the problem where memory was continuously allocated to the Old Gen during batch processing and was not being released. SituationIn my case, I collected additional Spans and Attributes through the Agent’s Extension, aside from the Spans automatically instrumented by the Java Agent. As data collection occurred, memory leakage gradually increased, and after the server went into production, most of the allocated heap was occupied by the Old Gen, causing the server to crash with an OutOfMemoryError (OOM). Although it seemed like the issue was resolved temporarily by garbage collection (GC), newly generated Spans were immediately allocated to the Old Gen. CauseThe cause and solution I identified are as follows (the issue was resolved in my case, but the accuracy is not guaranteed, so you should verify it yourself). If you observe a continuous increase in the Old Gen in the metrics you are collecting, it might be a similar case to mine.
I hope this helps you resolve your issue quickly. |
@vanilla-sundae thanks for reporting, unfortunately the information provided is not enough to understand and fix the issue. You should examine the heap dump and try to answer the following.
|
Describe the bug
Context
My service uses Otel Java agent published by this library https://github.com/aws-observability/aws-otel-java-instrumentation
. with annotations
@WithSpan
and@SpanAttribute
(https://opentelemetry.io/docs/zero-code/java/agent/annotations/) in the code to get traces for our requests.Problem Statement
Otel Java agent was set up correctly, and no memory issue with initial setup. However, it's after we add annotations
@WithSpan
and@SpanAttribute
to the service code that we started to see a periodic memory increase issue (JVM metricHeapMemoryAfterGCUse
increased to almost 100%) with a lot of otel objects created on the heap, and we have to bounce our hosts to mitigate it.Otel objects we saw are mainly
io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap$WeakKey
andio.opentelemetry.javaagent.bootstrap.executors.PropagatedContext
, as well as java objectsjava.util.concurrent.ConcurrentHashMap$Node
andjava.lang.ref.WeakReference
We added
@WithSpan
to methods executed by child threads and virtual threads, not sure if that would be a concern. But we are able to view traces for these methods correctly.Here's our heap dump result:
Histogram:
Memory Leak Suspect Report:
Ask
Can anyone help with this issue and let us know what the root cause could be?
Steps to reproduce
We set up java agent in our service docker image file:
And we add
@WithSpan
to methods and@SpanAttribute
to one of the arguments.Expected behavior
No or minimum impact on heap memory usage.
Actual behavior
Heap memory usage after GC increase to 100% if we don't bounce the hosts.
Javaagent or library instrumentation version
v1.32.3
Environment
JDK: JDK21
OS: Linux x86_64
Additional context
No response
The text was updated successfully, but these errors were encountered: