Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryAllocationExports Not Collected By Garbage Collector And Causing Metaspace OutOfMemoryException #809

Open
Drophoff opened this issue Sep 11, 2022 · 8 comments · May be fixed by #811

Comments

@Drophoff
Copy link

Actual Situation

The class MemoryAllocationExports creates a memory leak caused by AllocationCountingNotificationListener that is added but never removed, when this libary is included within a WAR application that gets deployed on a Tomcat server.

The AllocationCountingNotificationListener is added to the class GarbageCollectorMXBean (see class GarbageCollectorExtIml at the attached screenshot) that is loaded by java.lang.ClassLoader, which is different from the org.apache.catalina.loader.ParallelWebappClassLoader.

  public MemoryAllocationExports() {
    AllocationCountingNotificationListener listener = new AllocationCountingNotificationListener(allocatedCounter);
    for (GarbageCollectorMXBean garbageCollectorMXBean : getGarbageCollectorMXBeans()) {
      if (garbageCollectorMXBean instanceof NotificationEmitter) {
        ((NotificationEmitter) garbageCollectorMXBean).addNotificationListener(listener, null, null);
      }
    }
  }

Because of this the AllocationCountingNotificationListener has a reference to a class with a different classloader, which prevents that this class gets collected by garbage collector.

ClassLoaderReferences

Possible solution (outlook)

At the moment the class MemoryAllocationExports provides no function to retrieve a instance of AllocationCountingNotificationListener, which is needed to call NotificationEmitter#removeNotificationListener:

  for (GarbageCollectorMXBean garbageCollectorMXBean : getGarbageCollectorMXBeans()) {
	  if (garbageCollectorMXBean instanceof NotificationEmitter) {
		((NotificationEmitter) garbageCollectorMXBean).removeNotificationListener(listener);
	  }
  }

When the above code gets called the class is collected and clean up by the garbage collector.

The class DefaultExports contains already a function for initialization. Possibly this class would also be suitable to provide a corresponding function for the clean shutdown, which could be called during ServletContextListener#contextDestroyed.

Environment

  • Server: Apache Tomcat 9.0.65
  • JVM: 17.0.4.1+1 Eclipse Adoptium
  • OS: Linux 4.19.0-13-amd64
  • Simpleclient Version: 0.16.0
@fstab
Copy link
Member

fstab commented Sep 11, 2022

Thanks a lot for the detailed analysis and for finding this bug! I agree we should add a DefaultExports.destroy() method that can be called from a ServletContextListener.

Some thoughts on how this could be implemented:

  • We'll need to add a MemoryAllocationExports.destroy() method with your code above. DefaultExports.destroy() should call MemoryAllocationExports.destroy().
  • If MemoryAllocationExports.collect() is called after MemoryAllocationExports.destroy() it should throw an IllegalStateException.
  • The current implementation of DefaultExports might create multiple MemoryAllocationExports instances if DefaultExports.register(registry) is called multiple times. This must be changed such that there's only a single instance. To be consistent, there should only be a single instance of all other collectors in DefaultExports as well.

Anyway, as you have already figured out the issue implementation seems pretty straightforward. Do you want to open a PR?

Note: I renamed master to main today, so in case you have a local copy you might want to update the branch name before creating a feature branch.

@fstab
Copy link
Member

fstab commented Sep 12, 2022

Just a shower thought: Maybe there is a way to allow clean-up without an explicit ServletContextListener.

If we had a wrapper around the listener like this:

  static class Listener implements NotificationListener {

    private final WeakReference<NotificationListener> delegate;

    private Listener(NotificationListener listener) {
      this.delegate = new WeakReference<>(listener);
    }

    @Override
    public void handleNotification(Notification notification, Object handback) {
      NotificationListener listener = delegate.get();
      if (listener == null) {
        for (GarbageCollectorMXBean garbageCollectorMXBean : ManagementFactory.getGarbageCollectorMXBeans()) {
          if (garbageCollectorMXBean instanceof NotificationEmitter) {
            try {
              ((NotificationEmitter) garbageCollectorMXBean).removeNotificationListener(this);
            } catch (ListenerNotFoundException ignored) {
            }
          }
        }
      } else {
        listener.handleNotification(notification, handback);
      }
    }
  }

If we load this class in a new class loader with the system class loader as its parent, then the Web application can be collected. Then delegate.get() return null and the listener removes itself.

I didn't try it, maybe it doesn't work at all. What do you think?

@Drophoff
Copy link
Author

Thank you very much for the quick and detail feedback.

I think the point about multiple calls to DefaultExport#register is independent of the mentioned issue.

Besides, the Java Doc of the class explicitly points out that the initialisation method should be used for multiple calls.

Also, from my point of view, multiple calls would be a wrong usage of the public interface. And yes, the library could restrict the incorrect usage, but in this case the approach should be considered consistently for the whole library and not just for a single method. But yes, this additional logic could be added.

The idea with the ClassLoader could work, but in my view it is a too deep intervention in the system. The GarbageCollectorMXBean class is part of the JVM and is therefore loaded by the system's ClassLoader. This happens before the Web Application ClassLoader loads the library. Therefore, the class must be removed from the system ClassLoader and then explicitly loaded by a new ClassLoader of the current Thread.

An alternative solution, which is also platform-dependent and which I also refrain from, would be the way via a finalize-method. This solution depends on the garbage collector implementation and would also fail in case the finalize-method throws an exception.

Due to that I would stick to DefaultExports.destroy(), which is consistent to the DefaultExports#initialize.

If you agree with the destroy proposal, I can try to create a corresponding pull request.

@fstab
Copy link
Member

fstab commented Sep 13, 2022

Yes, let's go ahead with the destroy() method. I'm looking forward to your PR!

I thought about the classloader idea again, but it's harder than I thought. It's easy to write a custom classloader for loading the Listener, and then listener.getClass().getClassloader() would point to the custom classloader instead of the web app classloader. However, this just adds a level of indirection. listener.getClass().getClassloader().getClass().getClassloader() will still be the web app classloader. Breaking the chain completely can only be achieved by injecting the Listener class into the system class loader, which might be possible with Bytebuddy, but the complexity is not worth it. Anyway, let's go with an explicit destroy() method.

@fstab
Copy link
Member

fstab commented Sep 14, 2022

I wrote above

If MemoryAllocationExports.collect() is called after MemoryAllocationExports.destroy() it should throw an IllegalStateException

I'm not sure if that's really a good idea. I guess it makes sense if you can prevent collect() from being called after destroy(), but if there is a race condition between undeploying an application and the Prometheus server scrape it might be better just to return an empty result.

@Drophoff
Copy link
Author

Hi @fstab,

are there still open points from my side?
I'm asking because some time has passed since the last activity.

@fstab
Copy link
Member

fstab commented Sep 27, 2023

Hi @Drophoff , sorry, I dropped the ball here. We just released the 1.0.0 version, and the MemoryAllocationExports moved to https://github.com/prometheus/client_java/blob/main/prometheus-metrics-instrumentation-jvm/src/main/java/io/prometheus/metrics/instrumentation/jvm/JvmMemoryPoolAllocationMetrics.java. However, the code is still the same, so your PR should still be relevant.

I'm heading to PromCon for the rest of the week. I'll put this on top of my todo list for beginning of next week. Sorry again for dropping the ball and thanks a lot for pinging me again.

@zorba128
Copy link

zorba128 commented Dec 12, 2023

This problem with releasing metrics is one of issues that prevents http4s adapter from being upgraded easily.

If object that groups metrics into bigger units existed - one for this particular group of metrics should simply implement AutoCloseable. And its to caller to instantiate it, and its to caller to release it when not needed.
It could be wrapped in async resource, it could be bound to server lifecycle. And problem now is there is no object to attach this - just bunch or unrelated counters/gauges.

Even the collector registration to the registry can be wrapped as resource (closing "registration" resource should unregister underlying collector) - but to make it possible it is needed to operate on composite Collectors rather than unrelated metrics built by some common code.

var collector = JvmMemoryPoolAllocationMetrics.build()
Server.run(...); // say it blocks until shutdown signal
collector.close(); // releases mbeans and all related stuff

Just reconsider #904 and #905. Maybe I'll add sample how this can work to PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants