Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design multi-threading support #97

Open
danielocfb opened this issue Mar 22, 2023 · 4 comments
Open

Design multi-threading support #97

danielocfb opened this issue Mar 22, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@danielocfb
Copy link
Collaborator

We should come up with a way to support multi-threading where beneficial, to speed things up. See #93 for when we removed earlier support for that in DWARF parsing code, which wasn't accessible by users.

@danielocfb danielocfb added the enhancement New feature or request label Jun 7, 2023
@danielocfb
Copy link
Collaborator Author

danielocfb commented Jul 25, 2024

The more I think about it and look at the original approach, the less I like it and the more I think it is not suitable given our architecture. Our DWARF parsing as well as various other bits are lazy by design: we only parse data if it is being asked for. Ideally that wouldn't be the only mode of operation (#433), but it is how things work at this point and anything else is orthogonal: it just makes sense in many contexts this library is likely being used in: you generally don't want to parse a multi-GiB file eagerly if all you need is function information from a single compilation unit.

In my opinion the best way to multi-thread this kind of code is by (conceptually), replacing all OnceCell constructs with promises that resolve asynchronously. That resolution can but does not have to happen on a separate thread. Only once the result is needed would code actually wait for the completion. I think this mostly fits into the existing async Rust ecosystem, but there may be some differences. In the end we may not really want async (usage of the keyword) itself, because we are not interested in super fine grained work distribution: effectively nothing is blocking inside a "parse" sub-operation, so once you scheduled a promise there isn't really a need to cooperatively schedule sub-tasks. Rather, it can just run until completion. Not using async (the keyword) would mean we don't run into "function coloring" issues either. And we certainly don't have any intention of asyncyfying our API surface at this point.

Such an approach would be much more flexible to work with than hard coding thread usage in certain locations. Because we do not parse everything eagerly, it's hard to tell how best to distribute work to threads. But that is precisely what would be dictated by program structure ("run this function on this thread") if we directly use threads.

It would also mean that users could be given some control over threading properties: they could provide a spawn or schedule function that would know how many threads to use, when to spawn a new one or clean up existing ones, and even plug in an existing thread pool.

@danielocfb
Copy link
Collaborator Author

The main problem, I suspect, will be that the moment anything looking like a thread is involved there is a requirement for data to be 'static. For us, I believe, this may be a major issue, because we effectively use zero-copy parsing where possible and so the to-be-parsed data will be what is transferred between threads and that is unlikely to be 'static.

@danielocfb
Copy link
Collaborator Author

The main problem, I suspect, will be that the moment anything looking like a thread is involved there is a requirement for data to be 'static.

Well, that's not quite true of course. One can have scoped threads that alleviate this constraint, but at least as outlined above it would presumably be impossible to marry the two designs, I think.

@danielocfb
Copy link
Collaborator Author

For what it's worth, I got some basic "runtime"/scheduler scaffolding coded up over in https://github.com/d-e-s-o/blazesym/tree/topic/threading. So in principle we could start using it. The "problem" is that right now I don't actually see anywhere for it to be used: because we load everything lazily in an on-demand fashion, there really isn't much coarse grained work that could be done concurrently, I think. That would change once we add support for pre-populating caches (#433), for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant