Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMQ-9448 Fix persistent scheduler deadlock #1177

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

thezbyg
Copy link
Contributor

@thezbyg thezbyg commented Mar 16, 2024

Do not fire or schedule jobs while holding read lock on store.

@thezbyg
Copy link
Contributor Author

thezbyg commented Mar 16, 2024

Please use existing org.apache.activemq.broker.scheduler.JmsSchedulerTest#testCron test to confirm that persistent scheduler currently deadlocks on CRON jobs. This is not the same deadlock as reported in AMQ-9448, but it is caused by the same reason.

@jbonofre jbonofre self-requested a review March 17, 2024 10:48
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a unit test to illustrate how deadlock can happen and this change actually fix it.

Do not fire or schedule jobs while holding read lock on store.
@thezbyg thezbyg force-pushed the fix-persistent-scheduler-deadlock branch from bd33b42 to 0448cf2 Compare March 20, 2024 06:15
@thezbyg
Copy link
Contributor Author

thezbyg commented Mar 20, 2024

I have now added a new unit test. Does existing JmsSchedulerTest#testCron unit test run successfully for you before this change?

@mattrpav
Copy link
Contributor

Do you have a thread dump of the deadlock occurring?

@thezbyg
Copy link
Contributor Author

thezbyg commented Jun 1, 2024

Yes. This is the stack trace of "JobScheduler:JMS" thread blocked by itself due to acquiring read lock on store while iterating scheduled jobs in mainLoop() method and then attempting to acquire write lock to write a new scheduled job information:
"JobScheduler:JMS" #28 daemon prio=5 os_prio=0 cpu=11.62ms elapsed=117.73s tid=0x00007f43d1344b50 nid=0x185b waiting on condition [0x00007f4356cfe000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x000000008bd25150> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:715) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:938) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock([email protected]/ReentrantReadWriteLock.java:959) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl$8.visit(JobSchedulerStoreImpl.java:684) at org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand.visit(KahaAddScheduledJobCommand.java:283) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.process(JobSchedulerStoreImpl.java:679) at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:495) at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:403) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:252) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:100) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:782) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:699) at java.lang.Thread.run([email protected]/Thread.java:833)

Full dump:
dump.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants