-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large enough messages block the main thread for a long time, which can lead to socket timeouts on blocked clients #339
Comments
Awesome, @kid-icarus. Note that there's a little sleight of hand in here for file size -- we're deflating at some point. I see you're on 2.1.10 in the package.json. There was recently a performance fix for documents with lots of small objects. I think it went into 2.11.13 (which I think I shipped in the most recent automerge-repo a few days ago.) Can you see if/how much that helps? That said, there's an eventual physical limit to how much we can optimize the message receipt for a very large message: you can imagine a 1GB text field, for example. Our long-term strategy here has a few components:
Architecturally, there are two parts to Automerge: a CRDT that has to decide what the state of the world is/should be and a materialized view that represents the full document and is updated when changes occur. The latter has to live in the render thread (it's the state of your program, you need it there) but theoretically we could house the former elsewhere and communicate over some messaging channel. Historically, Automerge worked this way (way back in the early, very slow days pre 1.0) but marshalling messages over thread boundaries is prone to performance and latency problems as well (not to mention architectural complexity for implementers), so in the 2.0 / Rust line we put an emphasis on "just be fast enough". I think most of the work we've described above is "shovel ready" in the sense that we could move on it at any time if we had the development bandwidth to do so... but the project is not funded at a scale where we can do that. We are not a VC backed startup and our funding is earmarked to support our research goals. So if you want us to do something, you can
In the third case, I'll note that even sending things upstream via the DIY approach is very welcome but it does impose costs on us: we have to review code, write tests, cut releases, and otherwise maintain the software going forward. If you build a thing we don't want or like... we won't merge it (so if you want us to merge it, keep us in the loop during the design.) That said, we have a bunch of folks in the community who have just become committers on the project and we'd love to have more. At the moment our big projects are to rebuild the sync system to better support many documents (funded by NLNet), deliver rich text support (funded by sponsors), work on memory use (needed for in-house work). I think left to my own devices my own next big push will be around auth stuff. Maybe some version control APIs. Hard to say. Anyway, every bit of research like this is extremely welcome and helpful! Working on these problems is very much "on the list" and we'd love to get to them just as soon as we're able. Thanks for reading my rant :) |
Thanks for the timely response @pvh, I really appreciate it 🙏🏻 So first of all, apologies for not using the latest automerge(-repo) versions. I'm seeing a drastic improvement with automerge 2.1.13 and automerge-repo 1.1.15.
The 46 second wait is now roughly 3-4 seconds. I think this is certainly fast enough :) I'll go ahead and close this issue out. When not using Without any changes, and using only RawString for strings in the document. Here is a CPU profile: I'll take some time to respond to the rest of your comment, but in short I would love to contribute back in a way that isn't a burden on everyone else. I understand the challenges of being resource-constrained, and wouldn't open any PRs without gauging interest and keeping folks in the loop beforehand. Feel free to reach out to me in the #devs channel in Discord (or you can always DM me if any of these issues become frustrating). I know everyone is doing their best given project priorities and constraints. |
I'm going to reopen this because 2.5s is still quite long for a document this small. Thanks for the trace, though. The save() seems like about what I'd expect at 180ms (of which 40% is compression). Looking at the rest, we've got several big processes running in this trace. In order:
Looking at this trace I think we could plausibly get ~1s improvement with a little bit of plumbing work (reuse save() content, use last diff cache). The progressDocument stuff feels like there shsould be room to improve (though I haven't looked closely enough to have a fix) and the receiveSyncMessage stuff itself should be helped by Orion's current memory work (though we'll have to wait and see.) @kid-icarus one useful reference would be a performance comparison to raw JSON with |
Hello again :)
I wanted to post some performance findings:
I have a roughly 10MB JSON file I'm parsing and creating a new automerge document from. Here's the size of the automerge document on disk:
Using next's
Automerge.from
: 1.6MBUsing next +
Automerge.RawString
for all strings: 744KBUsing stable
Automerge.from
(which from my understanding is usingAutomerge.Text
for strings): 744KBThis is quite impressive!
Now, on to automerge-repo. Along with @georgewsu, I've been trying to stress test automerge-repo via the sync server.
What @georgewsu and I are trying to measure is the time it takes for the sync server to receive such a large document, and how long it takes another client to fetch it immediately after. I've created a to reproduce the findings. I've shared an example JSON payload with @alexjg, but I can't post it publicly. It's a ton of nested maps with a fair amount of strings in the maps. If there's anything I can do to improve the signal we're getting from the tests, or if I've made any mistakes in the tests that would lead to abnormal, suboptimal performance, I'd love to hear any ways I could improve them. Additionally, I'd be happy to contribute these types of tests/benchmarks back in a structured fashion so that folks can improve on existing limitations.
That said, with the payload used, the sync server took roughly 46 seconds to receive the document. While receiving the document, it blocked the main thread, severely limiting concurrency (other clients encountered socket timeouts). I pinpointed it to a call to A.receiveSyncMessage in the doc synchronizer. A deeper analysis shows calls into wasm; notably, calls to
applyPatches
comprise most of the time spent. Here's a CPU profile you can load into Chrome dev tools if you'd like to look at my findings CPU-20240411T110711.cpuprofileThe fact that the main thread is blocked is problematic regarding concurrency. I'm just brainstorming ideas here, but would it make sense to throw that call to
A.receiveSyncMessage
in a separate thread? I know that automerge-repo is intended to work in the browser and node and that isomorphic threading can be a pain (web workers vs. worker threads). Also, the overhead of sending the data to/from the worker without a SharedArrayBuffer might impact potential performance gains. Still, I would see the benefit being that, so long all CPU cores aren't busy receiving huge docs (I imagine an infrequent use case), at least a few huge docs could be received while other cores happily synchronize more minor edits.The text was updated successfully, but these errors were encountered: