Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Deadlock w/ Timer and Lwjgl3Application #7548

Open
3 of 6 tasks
0xKZ opened this issue Dec 20, 2024 · 5 comments
Open
3 of 6 tasks

[Bug] Deadlock w/ Timer and Lwjgl3Application #7548

0xKZ opened this issue Dec 20, 2024 · 5 comments

Comments

@0xKZ
Copy link

0xKZ commented Dec 20, 2024

Issue details

The situation is as follows:

  1. Create a Lwjgl3Application, and from that application:
  2. Kick off a different thread that creates a Timer, which causes the timer instance to be set and the gdx files instance to be cached.
  3. Swap instances to a different Lwjgl3Application. (Timer implies it supports this by checking if it's local version of files matches that of the static Gdx.files, which will be important later).

Now, the deadlock can happen like so:

  1. [Main Thread] Lwjgl3Application.java, loop() is called, which is synchronized on lifecycleListeners. This calls render().
  2. [Thread 2] This thread now tries to create a new timer (for some application-logic-specific reason). This thread synchronizes on threadLock in the instance() call, and so that lock is now held. When it calls the static thread() method, because GDX files has changed, it calls dispose(). While in dispose(), it tries to call removeLifecycleListener on Lwjgl3Application. We need the lifecycleListeners lock to do so, but that is held by Main Thread. So we wait.
  3. [Main Thread] Within that render() call, something tries to create a Timer. We are stuck waiting for the threadLock lock, which is held by Thread 2.

Now, each thread is waiting for a lock that the other holds, and is deadlocked.
I was able to get this deadlock to happen in the intellij debugger and used a thread dump to confirm all of this.

Version of libGDX and/or relevant dependencies

1.13.0

Stacktrace

Thread 2:

"Test worker@1" tid=0x1 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 blocks Timer@26630
	 blocks Timer@26863
	 blocks pool-52-thread-2@26876
	 blocks Timer@27042
	 blocks Timer@27372
	 waiting for pool-52-thread-2@26876 to release lock on <0x6b93> (a java.lang.Object)
	  at com.badlogic.gdx.utils.Timer.instance(Timer.java:38)
	  at com.badlogic.gdx.utils.Timer.schedule(Timer.java:187)

Thread 1:

"pool-52-thread-2@26876" tid=0x93 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 blocks Test worker@1
	 waiting for Test worker@1 to release lock on <0x7a0e> (a com.badlogic.gdx.utils.Array)
	  at com.badlogic.gdx.backends.lwjgl3.Lwjgl3Application.removeLifecycleListener(Lwjgl3Application.java:402)
	  at com.badlogic.gdx.utils.Timer$TimerThread.dispose(Timer.java:367)
	  at com.badlogic.gdx.utils.Timer.thread(Timer.java:48)
	  - locked <0x6b93> (a java.lang.Object)
	  at com.badlogic.gdx.utils.Timer.instance(Timer.java:39)
	  at com.badlogic.gdx.utils.Timer.schedule(Timer.java:187)

Please select the affected platforms

  • Android
  • iOS
  • HTML/GWT
  • Windows
  • Linux
  • macOS
@0xKZ
Copy link
Author

0xKZ commented Dec 20, 2024

One possible solution is to just make Timer throw an exception if the Gdx.files instance changes instead of trying to support that. I am not sure if this is a usecase that y'all are actually intending to support.

Edit: Actually, I am not sure why Timer tries to look at gdx.files in the first place. Can it simply not do that? (edit again: Ah, because it wants to be compatible with the lifecycle events and uses the files as a proxy for that.)

@0xKZ 0xKZ changed the title [Bug] Deadlock w/ Timer and Lwjgl3Application.java [Bug] Deadlock w/ Timer and Lwjgl3Application Dec 20, 2024
@tommyettinger
Copy link
Member

I don't do much threading after a months-long bad experience, but I provide you with this link https://libgdx.com/wiki/app/threading#which-libgdx-classes-are-thread-safe and its relevant contents below:

Which libGDX classes are Thread-safe?
No class in libGDX is thread-safe unless explicitly marked as thread-safe in the class documentation!

I'm not sure how many more places other than the wiki need this disclaimer put in, prominently; I couldn't find it in the sources. I was also under the impression that any static state, anywhere in Java, is inherently unsafe to mutate on multiple threads simultaneously. I'm not sure if there's a way to do whatever you're doing in a thread-safe way.

@0xKZ
Copy link
Author

0xKZ commented Dec 25, 2024

Yes I am familiar with that link.

The Timer class looks like it is intended to be called from other threads because it has mechanisms to synchronize and run the provided runnables on the main thread. It is trying to be thread safe, because it kicks off a thread of its own. And it works almost all of the time, just not in the rare case where you swap the GDX instance.

If we want to say it is not thread safe, the implementation could be a lot simpler (just store tasks on the main thread, count down the time until they are ready). The documentation should also be updated to not imply it works across threads - it really seems like it is trying to be thread safe...

(For my personal project, I have already replaced Timer with a thread-safe solution, so I already have a solution for my needs, this is about preventing other people from shooting themselves in the foot)

@obigu
Copy link
Contributor

obigu commented Dec 26, 2024

I agree Timer class is designed (at least it looks like) to be thread safe. I'll ping @NathanSweet who will probably be able to give better insights.

@NathanSweet
Copy link
Member

Timer is indeed intended to be thread safe. Repro code wasn't provided, so I have to guess at it from the description. Maybe:

static public void main (String[] args) {
	new Lwjgl3Application(new ApplicationAdapter() {
		public void create () {
			Timer timer = new Timer();
			new Thread() {
				public void run () {
					timer.scheduleTask(new Task() {
						public void run () {
							System.out.println("1");
						}
					}, 1);
				}
			}.start();
			new Lwjgl3Application(new ApplicationAdapter() {
				public void create () {
					new Thread() {
						public void run () {
							timer.scheduleTask(new Task() {
								public void run () {
									System.out.println("2");
								}
							}, 1);
						}
					}.start();
				}
			});
		}
	});
}

I doubt this is really a sensible thing to do, but here we goooo. The first app locks lifecycleListeners to call window.update(). That calls create which calls new Lwjgl3Application to start the second app. That constructor probably doesn't return quickly, if ever. The first app's longlasting lifecycleListeners lock means that if some other thread calls removeLifecycleListener (which should be thread safe) then it hangs. In this case it's Timer.TimerThread calling removeLifecycleListener, but any thread would hang. Either 1) this code isn't a reasonable thing to do (and should probably throw), or 2) the way lifecycleListeners is locked should be changed.

If my repro is wrong then we need an executable repro (like we always do).

Related, I see unscynchronized read and write of lifecycleListeners in Lwjgl3Application:

for (int i = lifecycleListeners.size - 1; i >= 0; i--) {
LifecycleListener l = lifecycleListeners.get(i);
l.pause();
l.dispose();
}
lifecycleListeners.clear();

To be thread safe, all read and write access needs to be synchronized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants