Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP, doesn't even compile #4201

Draft
wants to merge 30 commits into
base: series/3.x
Choose a base branch
from

Conversation

durban
Copy link
Contributor

@durban durban commented Dec 15, 2024

This is on top of @djspiewak's wip/multithreaded-wstp branch. I did nothing so far, except default to SleepSystem in 489fbb9. This makes it possible to run IOSpec on scala-native; it (sometimes) passes on my machine ;-). It's probably too early to have this as a PR, but I'm doing it anyway to avoid duplicating work, and maybe to have a discussion about EpollSystem (see below). (@djspiewak feel free to close this PR if you have other plans with your branch.)

@durban
Copy link
Contributor Author

durban commented Dec 15, 2024

So, I'm running this on Linux, and without 489fbb9 it tries to use EpollSystem. But starting the tests (e.g., testsNative/testOnly cats.effect.IOSpec) only leads to a hanging process. It seems to me, that the WSTP threads are waiting in epoll_wait, and a (the?) GC thread seems to wait for mutator threads to reach a safepoint(?). (Details: the GC is in thread_yield, called by Synchronizer_acquire.) I'm speculating here, but maybe Thread.sleep() does "something", which EpollSystem doesn't, before epoll_waiting?

EDIT: what I wrote above is probably completely wrong... 🤷‍♂️

@djspiewak
Copy link
Member

djspiewak commented Dec 15, 2024

This is great! Thank you for pushing this forward to the next obstacle.

One of the things that occurred to me as I poked at my branch originally is SN's tooling for introspecting thread state is really really limited so far as I understand it. Maybe this is just my ignorance and LLVM has some magic we could turn on, but I strongly suspect we're going to need better introspection to run down some of these problems.

What I'm thinking is we're probably going to end up building that, or at least leaning in heavily to do so, and that's probably a large part of what we'll need to do to get this off the ground. We should chat with the SN folks.

@armanbilge
Copy link
Member

armanbilge commented Dec 15, 2024

The reason it's hanging is because we haven't implemented interruption yet for the Native I/O-polling systems. This wasn't necessary when it was single-threaded, but now it's critical :)

def interrupt(targetThread: Thread, targetPoller: Poller): Unit = ()

Compare with:

def interrupt(targetThread: Thread, targetPoller: Poller): Unit = {
targetPoller.selector.wakeup()
()
}

@armanbilge
Copy link
Member

Oh, the other reason it may be hanging is indeed related to GC. On Scala Native, blocking native calls need to be annotated explicitly with the @blocking annotation, so that it does the necessary book-keeping so it's possible to GC while a thread is stuck in that blocking call.

https://github.com/scala-native/scala-native/blob/c7b54a18e3ff11d8b2792f16fbb6e97780314014/nativelib/src/main/scala/scala/scalanative/unsafe/package.scala#L103-L106

For now it's fine to just mark it @blocking, but b/c this comes at a performance cost, we should actually make two separate epoll_wait bindings. One will use @blocking for when the timeout is > 0, and the other will not, for when the timeout == 0.

@djspiewak
Copy link
Member

The reason it's hanging is because we haven't implemented interruption yet for the Native I/O-polling systems. This wasn't necessary when it was single-threaded, but now it's critical :)

You know, I didn't even think about this. Makes loads of sense though. Pipes time!

@durban
Copy link
Contributor Author

durban commented Dec 20, 2024

@armanbilge Thanks, I've tried to do the 2 things you mentioned. In 62b8141 I turned on the EpollSystem again, and tried implementing interrupt, and added the scala-native blocking annotation. This way testsNative/testOnly cats.effect.IOSpec passes on my machine. It obviously needs more work (e.g., I think interrupt is not threadsafe), but at least it's a step in the right direction.

Comment on lines +257 to +258
// TODO: this is not threadsafe, we're reading `interruptFd` without synchronization:
if (unistd.write(this.interruptFd, buf, 8.toCSize) == -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be, it will be synchronized by workerThreadPublisher at its callsites.

workerThreadPublisher.get()
val worker = workerThreads(index)
system.interrupt(worker, pollers(index))

@djspiewak
Copy link
Member

djspiewak commented Dec 26, 2024

Interesting. So I merged your branch with series/3.x and now I'm getting the following:

[error] Unknown DWARF abbrev code: 26
[error] 
[error] STACKTRACE
[error] 
[error] java.lang.RuntimeException: Unknown DWARF abbrev code: 26
[error] 
[error] 
[error] This looks like a specs2 exception...
[error] Please report it with the preceding stacktrace at http://github.com/etorreborre/specs2/issues
[error]  
[error] Error: Total 1, Failed 0, Errors 1, Passed 0
[error] Error during tests:
[error] 	cats.effect.IOSpec

Edit: Appears to be a macOS only thing. Compiles and runs on Linux. Lovely.

@djspiewak
Copy link
Member

Okay got around the issue with Lorenzo's help. It's fixed in SN main, so I updated to a local snapshot (lol) on my branch and made progress. I'll dig into interruption for kqueue

@djspiewak
Copy link
Member

Update:

  1. We no longer need the snapshot. I've pushed the magic compiler settings incantation on my branch
  2. Also pushed is an initial stab at using EVFILT_USER to handle kqueue interrupts. I implemented it by basically putting that event permanently into the front of the changes array and then triggering it on the kqfd on interrupt(). In principle, this makes sense, but it doesn't actually work. The threads wake up but things spin forever. I'm probably being dumb. Have fun.

Will get back to this later.

@djspiewak
Copy link
Member

I pushed more. Kqueue is pretty close to working I think.

@djspiewak
Copy link
Member

Update: I've got testsNative/test almost entirely green on macOS.

@durban
Copy link
Contributor Author

durban commented Dec 28, 2024

@djspiewak Yeah, I don't know anything about macos; I can't even help testing it (I have no access to such a system). Anyway, I've merged your branch into this PR, so that your progress is visible here too.

@djspiewak
Copy link
Member

I think this PR will become the PR, more than likely

@djspiewak
Copy link
Member

Oh, if you want to just focus on Linux + epoll, that works perfectly. There's also more stuff to go through and pull out of jvm-specific areas and into jvm-native areas (e.g. the IOPlatformSpecs, which are all intermixed and weird, or interruptibleImpl, etc). Lots of stuff strewn around.

@durban
Copy link
Contributor Author

durban commented Dec 28, 2024

Okay, I'll try to look at one of those when I'll have some time. (Btw, I think you can push directly to this PR; at least I've checked that github checkbox.)

@djspiewak
Copy link
Member

Alrighty, I'm going to deprecate my fork's branch then and we can centralize on yours.

@djspiewak
Copy link
Member

Oh btw as a planning process note… Once we can get something which is mergeable, we should do that, close out this PR, and use normal PR process from that point forward

@djspiewak
Copy link
Member

Btw I can't actually push to your branch. Pushing to mine for now.

@djspiewak
Copy link
Member

djspiewak commented Dec 28, 2024

Status report:

I think I've got all (but one) of the tests working and all the functionality is shifted over aside from interruptible-related things and IOAppSpec (this is worth checking though just in case I missed something). The latter is dependent on the sbt yak shave (the BuildInfo stuff). The interruptible stuff just needs to be stubbed around basically since we don't have the right exception types in SN; fairly trivial. The one failing test is "handle lots of simultaneous events" in FileDescriptorPollerSpec, which is annoying since it probably actually means something. Everything else is passing.

I took the time to factor out kevent64 into blocking and non-blocking variants. The pattern works pretty well so we should probably do something similar for epoll.

This still doesn't compile on Scala 3 because of the epoll_event tag shenanigans. That's going to take some more compiler sweet talking.

One thing that has come out as part of this is our conditional source matrix is getting pretty hairy. It might be worth taking a pass over things and updating our encodings to be what we want them to be (e.g. we have Platform traits, but some of them are cross-platform, and we also have the isJVM/isNative/etc checks, and sometimes we mix them all together for maximal confusion).

Anyway, apart from the FD weirdness, I think kqueue is pretty much ready to go. It could be optimized more but I'm not worried about performance yet. We're very close on this branch as a whole, tbh.

Edit: I've confirmed that Linux and macOS are now basically in the same state, which is reassuring. I actually wonder if something might be odd about that particular test rather than the underlying polling system or WSTP?

@durban
Copy link
Contributor Author

durban commented Dec 29, 2024

Btw I can't actually push to your branch

Huh, that's strange. I double checked, and the "allow edits by maintainers" is selected. I don't know what's going on. Anyway, I'm sure we'll be able to solve it with some combination of merge and push...

@armanbilge
Copy link
Member

I've confirmed that Linux and macOS are now basically in the same state,

This is not my observation.

  • On Linux, the entire FileDescriptorPollerSpec is broken. There seems to be a segfault when registering the pipes with the epoll.
  • On macOS, only the "handle lots of simultaneous events" is failing. It can segfault, or for smaller numbers simply hang and timeout.

@djspiewak
Copy link
Member

There seems to be a segfault when registering the pipes with the epoll

Do segfaults manifest as fatal errors in sbt? If so then that's exactly what I'm experiencing, not a failing test.

@djspiewak
Copy link
Member

Oh I see what you mean. Hmm. I must have just gotten a segfault in both and assumed they were the same issue. Unfortunate.

@durban
Copy link
Contributor Author

durban commented Dec 31, 2024

I think I've fixed the segfault in FileDescriptorPollerSpec for EpollSystem (see f4bc66a). I think the stackallocs were somehow wrong: type epoll_event <: AnyRef makes scala-native think it's 16 bytes. While there is a Tag[epoll_event] with the correct size, stackalloc doesn't seem to use it(?). If I use the correct size manually, the segfault disappears. (I'm sure there is a nicer solution...)

@durban
Copy link
Contributor Author

durban commented Dec 31, 2024

Okay, so it seems that in SN 0.4 stackalloc takes an implicit Tag; probably that's why this used to work. In 0.5 it doesn't; but then how does it know the size? 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants