Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_counting_ptr, a threadsave callback solution #1713

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

daschuer
Copy link
Member

@daschuer daschuer commented Jun 16, 2018

This is a wrapper class to prevent to delete an object that is still in use
in a multithread environment. It allows to use the object from mutible
threads without locking. The final clear is however locking if the object is still
in use.
This can be used to implement callbacks to an object with shorter livetime. It is
a replacement for Qt's direct connections, which are not save across threads
in this case.

Usage:
call set(po) and clear() to store and remove a pinter from the object.
call the callback from a scope like this

{
    auto obj = m_obj.get();
    if (obj) {
        obj->callback();
    }
}

@daschuer daschuer changed the title Added use_counting_ptr a threadsave calback solution use_counting_ptr, a threadsave calback solution Jun 16, 2018
@Be-ing
Copy link
Contributor

Be-ing commented Jun 16, 2018

Could you explain what specific problem motivated you to start working on this? Does it solve a problem that came up in #1675?

@daschuer
Copy link
Member Author

I am experience rar crashes when shutting down Mixxx.I was never able to reproduce it under gdb, so I have never reported a bug. One idea is that this is caused by explicit direct connections across threads. Qt does not prevent to call a callback from an already deleted object, if there is a race between the callback call from one thread and the delete call from an other thread.
If we use this new use_counting_ptr, the deleting thread is locked until the callback has returned.

@daschuer
Copy link
Member Author

We use these direct connection for the engine thread COs

@daschuer
Copy link
Member Author

I cannot use this currently, because of pending #1700

@Be-ing Be-ing added the engine label Jun 19, 2018
@Be-ing
Copy link
Contributor

Be-ing commented Jun 19, 2018

Can we come back to this in a few weeks after the 2.2 feature freeze?

@daschuer
Copy link
Member Author

I like to have this included.

@Be-ing
Copy link
Contributor

Be-ing commented Jun 19, 2018

Why?

@daschuer
Copy link
Member Author

I actually waiting for a review of #1700 more urgently. Our control objects where becoming a bit messy, and I think it is a good idea to clean them up in small steps instead of one giant PR.
This one so should hopefully solve the pending shut down crashes, finally.

@Be-ing
Copy link
Contributor

Be-ing commented Jun 19, 2018

I'm not questioning that we should clean the code, but I don't think < 2 weeks until a feature freeze is a good time to be focusing on it unless it fixes an important bug.

@Be-ing Be-ing added this to the 2.3.0 milestone Jun 23, 2018
@daschuer
Copy link
Member Author

@rryan Do you want to have a look here?
I like to have this new pointer to wrap the m_pCreatorCO here:
https://github.com/mixxxdj/mixxx/blob/master/src/control/control.h#L164

@daschuer
Copy link
Member Author

@uklotzde do you have interest to wrap your head around this? I would like it to fix the a probably cause of the crash on close fix I experience some times.

@uklotzde
Copy link
Contributor

Honestly I still don't get the point by reading the code??

Couldn't this be implemented in a simpler and more comprehensible way? I'm not able to decide if this works correctly in all possible cases. My feeling is that we might try to cover a design flaw by mitigating it with just another half-baked and fairly complex solution.

I noticed that the implementation has some technical flaws, e.g. the constructors are inconsistent and behave unexpectedly when invoked with a nullptr.

I came to the conclusion that we should use asynchronous signal/slot messaging whenever possible. Why plain callbacks? Trying to work around Qt's foundation and patterns is one of the main problems in our code base. I'm just criticizing myself, because this is what I would've done when rewriting the multi-threaded analysis code now. But I won't rewrite it a second time in C++ ;)

The global track cache is a slightly different story. It's also far too complex, but since track objects appear as shared mutable state everywhere we didn't have a choice without rewriting large parts of the application.

@daschuer
Copy link
Member Author

Thank you for looking into it.

This pointer was my idea for solving a crash on shut down when this https://github.com/mixxxdj/mixxx/blob/master/src/control/control.h#L164 object is deleted while an other thread is just about to call callback.

I would love to use a Qt solution for it, but it has not. Direct connections have no safety net in QT and suffer basically the same issue.

My idea was to use this pointer type to workaround this limitation, in an almost look free way.
The idea was to use this new pointer, for a maybe unusual but working solution of minimal scope.

Do you have alternative ideas? Is there a chance to tun this into a simpler and more comprehensible solution? That would be great.

@uklotzde
Copy link
Contributor

Would you please include an example were this technique is applied? I need to see it in action. We need this anyway. Adding unused and untested code to the repository is discouraged ;)

@daschuer
Copy link
Member Author

daschuer commented Dec 23, 2018

I have started to use the new pointer here: https://github.com/mixxxdj/mixxx/blob/master/src/control/control.h#L164
But it is becoming a great mess, because it is copied and stored all over the Mixxx code base.
We need to fix that first.

Many instances can be replaced by ControlProxyLt #1717

@github-actions
Copy link

This PR is marked as stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the stale Stale issues that haven't been updated for a long time. label Oct 20, 2020
@Be-ing Be-ing changed the base branch from master to main October 23, 2020 23:50
@ronso0 ronso0 marked this pull request as draft November 20, 2020 10:21
@ronso0 ronso0 changed the title use_counting_ptr, a threadsave calback solution use_counting_ptr, a threadsave callback solution Dec 12, 2021
@daschuer daschuer removed this from the 2.4.0 milestone Jul 5, 2022
@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 6, 2024

this can probably be closed, right?

@daschuer
Copy link
Member Author

daschuer commented Nov 7, 2024

Yes, the underlying issue still exists though.

@daschuer daschuer closed this Nov 7, 2024
@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 7, 2024

I don't really understand it though. Maybe you can explain it again?

@daschuer
Copy link
Member Author

daschuer commented Nov 7, 2024

When you call a member function form one thread and in the mean time an other thread deleted the object, we experence a crash. This happens if the caller does not hold a strong reference to the object until the call ends. That obvious so far.

Normally you would use a combination of a std::shared_ptr and a std::weak_ptr for it. This however, due to the shared ownership it is not predictable which threads destroys the object finally. We need protection from other multithreading issues in the destructor.

QPointer also does not help either, because you can only check whether the pointer is not null before the callbeack call, the object can become dengling during the call which cases still the crash.

This pointer here allows to borrow the pointer without that the original owner looses the ownership. It is guranteed that the desructor called in the creator thread is delayed unitl there is no user anymore for instance a callback has returned, than finally the detruction the object and itself can continue.

This issue happens during shutown of Mixxx when we hav a dangling pointer in the QT event queue which might happen in ControlDoublePrivate::valueChanged(double value, QObject* pSender); https://github.com/mixxxdj/mixxx/blob/master/src/control/control.h#L164

@daschuer daschuer reopened this Nov 7, 2024
@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 7, 2024

but why does it matter from which thread the destructor is called? This is only relevant in realtime safe-code where we IMO shouldn't use these patterns anyway.

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 7, 2024

This issue happens during shutown of Mixxx when we hav a dangling pointer in the QT event queue which might happen in ControlDoublePrivate::valueChanged(double value, QObject* pSender); https://github.com/mixxxdj/mixxx/blob/master/src/control/control.h#L164

Right, thats what QObject::deleteLater() is for though.

@daschuer
Copy link
Member Author

daschuer commented Nov 8, 2024

deleteLater() along with a std::shared_ptr and a std::weak_ptr does not work here unfortunatly, because the deleteLater() places an event in the no longer existing event queue which also introduces a crash. And and even if crash is fixed in one Qt version the memory will be leaked because there is no thread to delete the object. This will trigger our leaked CO check at Mixxx shutdown. It is also cumbersome to use deleteLater() for objects that are owned by the QT object tree.

It is required that the creator thread is suspended, until all object that it has to delete are deleted.
The solution presented here exactly wraps that requirement in a way that is "just works".

I would be happy to discuss with you alternative better to understand ideas if you have interest.

Functional requirements are:

  • delete all objects in the creator thread
  • Suspend the creator thread destructor until all objects are no longer used.
  • implement a use count.

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 8, 2024

I'm getting confused now. Is this about the thread in which the event loop is placed is being shut down or about the object being deleted too early?

@daschuer
Copy link
Member Author

daschuer commented Nov 8, 2024

Every Qt object lives in a dedicated thread. When we shut down Mixxx, both is disposed. The solution here works by suspending the object destructor, before the thread is disposed as well.
The critical callbacks are direct connected slots, called from another thread. We need to wait until all these callbacks have been finished.

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 8, 2024

When you call a member function form one thread and in the mean time an other thread deleted the object, we experence a crash.

Right. This essentially an architecturally rooted use-after-free issue. Why does this happen in ControlDoublePrivate?

@daschuer
Copy link
Member Author

daschuer commented Nov 8, 2024

Because we use direct connections into the engine thread. The engine thread has no Qt event loop. Qt has probably not considered direct connections in that case.

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 8, 2024

well, direct connections are just plain function calls afai understand. So its an architectural issue if we still have access to these objects after they're dead. Wouldn't QPointer fix this since it nulls out all the references once the QObject goes out of scope?

@daschuer
Copy link
Member Author

daschuer commented Nov 9, 2024

The issue with QPointer is that it does guarantees ownership. You can check it for null before calling the callback, but the crash still happens if the object is deleted after the check during the runtime of the callback itself.
With this solution here, the new pointer has the "used" or borrowed state and deleting the object is suspended until the object is no longer in use.

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 9, 2024

Ah I slowly see how this works. I still don't think its a good design though. Its a bandaid fix on top of the larger architectural issue that is directly calling member functions of objects living in other threads.

@daschuer
Copy link
Member Author

daschuer commented Nov 9, 2024

You have always the situation that an object that provides shared memory for concurrent access needs to outlive both threads. The architectural issue here is the Qt object tree where a shared ownership is not possible and the nature of Qts direct connections that do not have a safety net for such situations. Even without Qt, we don't want to pass the ownership of an object to the engine thread, because that it has to dispose it as well and free() is not lock free.
Luckily the issue happens only during Mixxx shutdown, but bites us whenever we consider introducing smart pointers which do not cover this situation. The idea here is to introduce a semantic for that which can be used for all of thoughts "borrow" situations.

Our discussion proves, that the solution implemented here is not self explaining. I still think that the building blocks are valid. So maybe you can recommend a better design?

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 9, 2024

Even without Qt, we don't want to pass the ownership of an object to the engine thread, because that it has to dispose it as well and free() is not lock free.

Well, on shutdown that would be okay IMO. We don't need the (I assume audio?) engine to remain realtime safe then. But I don't think thats a solution we should go for either.

I'm not too experienced with concurrency unfortunately so take this with a grain of salt but IMO the way C++ models this would be essentially akin to "structured concurrency" where a thread is a resource and we use RAII to manage it. That implies that the child thread must outlive its parent thread and the child thread must outlive any objects created within it. The thread can only shut down once all the objects created within it have also been destroyed (and any references to them have been invalidated of course, otherwise it would be a use-after-free). These constraints lend themselves nicely to RAII management which is one of C++'s strengths.

int main() {
    auto someMainObj = ...;
    auto engine = std::jthread([]{
        someEngineObject = ...;
        // communicating with someMainObj is easy (inside-out)
        // the other way is intenionally hard because we need
        // to workaround the nice automatic storage duration.
        // But since we have now transformed this into a regular
        // lifetime problem, our usual lifetime tools would suffice.
        // in this case, the solution would be to created a `shared_ptr`
        // in this thread and hand out a `weak_ptr` to the parent thread.
    });
}

@daschuer
Copy link
Member Author

daschuer commented Nov 9, 2024

I do not understand. How does this translate into a solution for Mixxx? The issue we experience here is real and to fix it before doing a major refactoring. Do you habe a plan for this? What is the issue with this solution regarding the picked primitives and naming?

@Swiftb0y
Copy link
Member

Swiftb0y commented Nov 9, 2024

I don't understand the particular problem in mixxx well enough to offer an alternative right now. I was just posting on how this is generally supposed to be solved. If you can link me to a backtrace of the issue you're describing I may be able to understand it more.

The problem I have with use_counting_ptr is that it seems to be a bandaid fix to tape over a larger architectural issue we should probably tackle instead.

Regardless, I wasn't trying to revive this PR, I simply went through our PR backlog by oldest PR and tried to find any that have gotten stale enough to the point where they're worth closing. This seemed like a candidate but apparently isn't. I don't have interested to revive this right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine stale Stale issues that haven't been updated for a long time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants