Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ares Weekly Meeting Notes #233

Open
4 tasks done
jalehman opened this issue May 29, 2024 · 12 comments
Open
4 tasks done

Ares Weekly Meeting Notes #233

jalehman opened this issue May 29, 2024 · 12 comments
Assignees
Labels
meeting notes Notes from project meetings

Comments

@jalehman
Copy link

jalehman commented May 29, 2024

~2024.5.29

Agenda

Additional Notes

  • Release
  • Instructions for running Ares can be found in DEVELOPERS.md
  • SKA
    • In context of memoization, we're emitting incorrect labels
    • 120 call sites with incorrect/missing labels in analysis of hoon.hoon out of ~20k total call sites
    • we have extensive debug information
    • appears we're matching properly, but not normalizing/propagating constraints properly
    • @joemfb has some questions for @eamsden that they need to run through together
  • Vere Compatibility
    • Making modifications to vere PR that allows replay in a subprocess. This will allow vere to call whichever serf is linked to perform replay.
    • Started on reading events, i.e. reading a vere pier from Ares. WIP
  • PMA
    • Memory corruption bug is believed to be solved
    • Chaos monkey testing is being started soon
    • @barter-simsum is clearing out some smaller known PMA bugs
    • Discussion on garbage collection between @barter-simsum @eamsden @joemfb happening later today
    • Garbage collection & testing can run in parallel
  • Interpreter optimizations
    • Blocked on completion of SKA
    • Possible to optimize the treewalking interpreter some, gains unclear w/ SKA
    • Initial line of work: modify linearizer so it emits noun code, output is contiguously numerically indexed so it can be efficiently referenced from arrays
      • Not blocked on SKA
  • @matthew-levan has anyone spoken with @ashelkovnykov about picking up work on this again?
    • @belisarius222 talked to Alex this morning, currently has no availability to work on ares
    • He seems interested in doing some work on this again when he's back from Japan, earliest June 4th
@jalehman jalehman self-assigned this May 29, 2024
@eamsden
Copy link
Collaborator

eamsden commented May 30, 2024

@jalehman requests:

  • close previous week's notes-issue when opening new notes issue
  • link to previous week's notes-issue in current week's
  • label meeting notes issues with "meeting notes" label as I have done for this one

@eamsden eamsden added the meeting notes Notes from project meetings label May 30, 2024
@jalehman
Copy link
Author

@eamsden My plan was to pin this issue and take all meeting notes here as comments — significantly less meeting note administration that way, and less clicking needed to establish context from past meetings. If you'd prefer it as outlined above, happy to do it that way.

@eamsden
Copy link
Collaborator

eamsden commented May 31, 2024

@jalehman that works too, I just don't want open meeting notes issues to proliferate or old notes to be hard to find

@eamsden eamsden pinned this issue May 31, 2024
@jalehman
Copy link
Author

jalehman commented Jun 5, 2024

~2024.6.5

Agenda

Notes

  • Every discussion is the overlay namespace discussion
  • Status of SKA
    • PR is up! Tricky cases abound in mutual recursion memoization
    • Has improved debug tooling, can get performance traces out of it, output is cleaned up
    • How far do we chase these last cases of recursive analysis? What do we decide to punt on?
    • The cases:
      • Punt all of them: analysis gets faster (details missing)
      • If you find that prior memoization of heuristic recursive analysis turns out to be wrong, turn that and everything above it indirect
      • Tangent: was not previously logging indirect calls in debug output, but this is what you care about as a consumer more than everything else
        • The paradigmatic case of an indirect call: (trip (?:(s tod:po tos:po) q))
      • Current state is subtly incorrect in small number of cases. It's not ok to be incorrect at all.
  • Status of Vere Compatibility
    • PR is up! It doesn't work yet.
    • You can boot boot a fresh fakezod with vere 3.0 and an ares serf on a slim pill into the dojo, exit gracefully, and reboot
    • Currently trying to make it work without graceful exit it will restart and replay missing events in log
    • Focusing on replay, making use of code that @ashelkovnykov wrote last year
    • Slapping stuff together to get it to work ASAP, but at some point the proper design will need to be evaluated after it works to make it right
    • Debating on whether to load snapshot before or after staging mars/urth split
    • Goal: proper replay by EOW
  • Status of PMA smaller bugs
    • Delay due to unexpected move
    • Smaller bugs should be closed out tomorrow
      • what are the smaller bugs?
  • Status of PMA chaos monkey testing
    • Moving onto this post bugs, should result in more bugs being found
    • Will need to rent some space in the cloud with lots of storage to test at capacity
    • We could also add measurements of performance under real usage during this stage
  • Status of PMA garbage collection (important discussion happened here)
    • will happen in parallel with chaos monkey
    • met with joe and edward on this last week
      • came away with multiple approaches
      • "stop the world" garbage collector
      • manually invocation of pack and meld routines
      • purpose isn't to free unneeded objects, as for instance the case in traditional GC, but to free up disk space
      • @belisarius222 most allocations are single page allocations
        • @eamsden that is not correct. when done with an event, we sum up necessary size of noun structure not already in PMA, and allocate in a single chunk.
        • initial implementation doesn't correctly allocate multi-page atoms
        • so, you have big page size allocations
      • @belisarius222 back to the original point: if you have the situation with lots of single-page allocs, and only big alloc (in terms of # of pages) is large indirect atoms, you'll end up with mostly 1-2 page free space holes in middle of PMA, new events will use one page that will fill those holes, implication being fairly good allocation with the exception of very large atoms/nouns spanning multiple pages
        • for example, movie file won't fit in the middle and PMA will almost certainly grow
        • big perf problem with GC in general is doing lots of disk copying between different parts of disk
        • if you don't do that, no matter how small your nouns get and how much you trim, your big media file will be a big anchor at the end of the pma keeping it very large
        • implications: running GC on the PMA probably won't do much if we don't move the files around, and if we do, we're doing lots of write amplification
        • solutions:
          • use a sparse file representation, so OS knows if middle pages are 0s not to represent on disk
          • move large allocations into their own files, but is more complex and has synchronization problems
          • @eamsden there's a secret third thing: pma itself is designed to allow for noncontiguous mapping of memory to disk, so this is more complex allocation logic, but that's it
            • heuristics: start grabbing stuff on top of the disk freelist from earlier to later until the sum of those thigns is size of alloc, then just make btree entries that are virtual memory contiguous, but point to different places on disk, as long as GC can understand that those are one object (more sophistication required) that's fine
              • it's like a block-based FS
              • sparse files are out — don't do this, it's unusable
            • @belisarius222 if you do it this way, would you still do multi-page allocations? @eamsden no, you'd treat them as blocks
          • @belisarius222 do you actually need GC? @eamsden yes, actually
          • @eamsden nouns and pages are semi-coupled in this design
            • when there are no more nouns in a page, you free it
            • you need GC at the noun level
          • It's still extent-based, but you want the ability to express multiple ranges for a given object
          • Takeaway: a single allocation can be multiple noncontiguous regions; you can break up large allocations into multiple regions. A very naive copying GC is basically fine. It's fairly straightforward to make it less naive and probably sufficient for a while. An updated cell representation for cells in the PMA that allows you to move cells a page at a time is good.
          • @belisarius222 it would be great to have a writeup of this at some point, since it's hard to hold in the head.
  • Status of interpreter optimizations
    • No further update from last week.
    • This could be done in a huddle-up at lake summit with the devs on this call.

@jalehman
Copy link
Author

~2024.6.12

Agenda

Notes

  • SKA
    • It's done (holy shit)
    • Analysis is done, has no known issues, takes 1.5 minutes on hoon.hoon
    • No known errors, completes in reasonable amount of time
    • Two big wins
      1. since beginning, wrong layer of memoization. context was not flowing back upstream to impact subsequent checks ("am I the same as this other thing"). Fix looks like it should be much slower (?), biggest perf win is not entirely understood
      • @eamsden would a jet for hub still speed things up significantly? unknown
    • What's next?
      • put pretty printing compute behind a flag and see how much further that speeds things up
      • either or both of boot a ship from the slim pill, try to analyze more things like arvo, vanes, userspace stuff, the analysis itself
      • no known bugs, memoization may be blocking future analysis, memoization is an unbounded space leak, overall lifecycle of analyzed code is incomplete
      • analyzing other vanes would be to find more bugs
      • @eamsden since we're still in alpha, merge condition for PR: successful boot of ship using slim pill. we should attempt this.
        • it's not difficult to do, nothing is preventing us from just doing this.
      • @matthew-levan to do this, just needs the jamfile, let's have @joemfb work with matt on it so he can learn too
      • unblocks the noun code — the interpreter optimizations
        • tree-walking interpreter is separate from the codegen work
        • there's an interpreter that takes nouns in format that linearizer produces, and executes those directly. those nouns are lists of instructions, so the optimization is to introduce a level of indirection in those nouns and use that to allow the rust to optimize those nouns by turning one level of them into arrays when it sees them. that turns a hamt lookup into an array lookup, and makes direct jet matching more real. @matthew-levan wrote most of the original codegen interpreter and is probably the right person to work on this.
      • lifecycle stuff that joe mentioned: needs to have a resolution for beta. doesn't block work that lets us show wins. it may blow up memory, but that's worth it for showing nock going fast.
        • you want subsequent rounds of analysis to further optimize
      • we can do this today
    • @eamsden @joemfb to commit the jamfile as an artifact into the repo
      • it is pushed
  • Vere compatibility
    • got replay working last week with nasty, stupid code (bad code! bad!)
    • @matthew-levan needs to spend more time on it this afternoon before pushing
      • needs advice on whether or not to restage mars/urth split or vere snapshot loads in ares
      • @eamsden a libvere crate for rust sounds cool
        • two approaches:
          1. link c code from vere into rust, and in rust just write methods that marshal vere nouns to ares nounds
          2. write pure rust code that can read the snapshot bins directly by mmapping them and doing pointer offsets. upside: no extra c dependencies required in this case. downside is that it has to match what vere expects exactly in read-only form.
        • option two is favored
          • write rust code that's able to mmap north.bin and south.bin and do pointer offsets into them, and migrate cold state
          • these are new lines of work in addition to basic replay
          • let's push this until after slim pill boot and linearizer optimizations (specifically: basic block coalescing, register minimization, some other things related to recursion that @eamsden will later enumerate)
  • @joemfb we're nearing the point of being able to begin the bytecode interpreter
    • @frodwith isn't interested in beginning this until he knows that the ground won't shift
    • noun code is a prototype of the bytecode
    • if you have the noun code, you're as near as you're going to get to a spec for the bytecode
    • current IR leaves too much open
    • we need a separate time set aside for discussing lifecycle stuff in detail
    • this is a good time for @matthew-levan to reengage with codegen interpreter
    • depending on what's found during slimpill boot, codegen may be ready for merge into status branch
  • happy path plan:
    • replay wrapped up today
    • slimpill boot works, SKA gets merged to status
    • do some benchmarks
    • spend week of lake summit writing interpreter optimizations as a group
  • Status of PMA smaller bugs
    • cleared out most of the ones being worked on
    • new one popped up about ephemeral structure restoration
      • cause seems to be marking pages as dirty that shouldn't be marked dirty
      • believed to be fixed
    • should be complete by EOD today
    • includes testing and validating ephemeral structure restoration, fixes known issues with free that enable GC, fixes restore of PMA, these should be mergeable into alpha
    • there'll be several PRs forthcoming later today
    • @matthew-levan wants to get a rundown on @barter-simsum 's code next week
  • Status of PMA chaos monkey testing
    • this starts next (tomorrow)
    • should surface more bugs and/or metrics
  • Status of PMA garbage collection
    • no update over last week
  • Status of interpreter optimizations
    • see above discussion in and following SKA
  • Other discussion
    • if slim pill boots, solid pill should as well
    • if it doesn't, we'll need a slimmer pill

@jalehman
Copy link
Author

~2024.6.26

Agenda

Notes

  • Nouncode status
    • Has not advanced past lake summit
    • Started speccing what needed to be done
    • Pushed WIP commit
  • Vere compat status
    • Vere side is done and merged
    • Ares side needs review but is working
      • Design review for correctness is needed beyond simple code review @eamsden
      • @joemfb best bet may be to go a step further towards mars/urth style replay
      • Need to be more explicit about where read/write responsibilities live between vere/ares
  • How can we advance the nouncode work while matt is out?
    • We need to be able to print the linearized output
      • This would help us troubleshoot faster
    • @eamsden has an unpushed WIP on nouncode that he hacked on that will be helpful, but probably cannot take it all the way to completion
  • Is Paul still lined up to do bytecode?
    • We need to talk to him and ensure that he's actually still on deck to do this work
    • @joemfb to talk to @frodwith this week to determine if/when he can begin on this
      • What more would you want to see to get started?
    • Prerequisites:
      • Lifecycle stuff
      • Working Nouncode
      • Optimized Nouncode
  • @barter-simsum was sick last two days
    • solved an issue with dirty bitmaps being accidentally persisted ahead of his talk
    • matt found a bug with fourth GB allocation? 3 gigs works, 4 gigs fails
      • what's the expression to reproduce? @belisarius222 to provide, it's a bex one-liner
        • Write a C test that does the allocation
  • one failing test (probably bad test logic) needs to be fixed
  • what do tests prove?
    • restoration
    • malloc of contiguous space
    • structural equality of ephemeral data structures upon restore
    • freeing data
    • freelists are being properly coalesced
    • close and reopen b tree, partitions are where they're expected to be
  • need a demo video that shows loading of 1 TB of data into an Urbit

@jalehman
Copy link
Author

jalehman commented Jul 3, 2024

~2024.7.3

Agenda

Notes

  • Status of PMA
    • What was thought to be a bug with the Nockstack is actually a bug with the bex jet. It is allocating 8x as much memory as it should.
    • Working demo video of PMA storing 5GB of data and reading it out correctly
    • Next steps:
      • Fix bug and allocate via Ares
      • Stress test the PMA by allocating 1TB of 1GB blobs, checksum them all, verify correct storage of data and that nothing falls over along the way
    • Testing:
      • Need more allocations and randomization of the allocation pattern
      • Thinks ephemeral state restoration is working pretty correctly
      • Many fixes need to be pushed and PR'd
  • Status of Nouncode
    • @eamsden is pushing on it when he has time, it's getting there, will see how much progress he can make before matt gets back
    • @joemfb has a relatively short punchlist of polish to apply to SKA
      • Refactoring and sanity checks
      • There's work to do on debug output in order to make the linearizer output something that looks like disassembler. We can't do that right now for dumb reasons. Then we can get line number labels and other debugging goodies.
  • We should build up a benchmark suite of analysis scenarios
    • @joemfb Ran cursory benchmarks on historic veres and arvos
      • Results: modern vere and arvo is 2x faster in compiling than old vere. This does not match lived experience.
      • @belisarius222 We should have bench.c that does a series of benchmarks and could then be put into CI
        • We could then run those across the different platforms we support
        • @midden-fabler would be a good candidate for this

@jalehman
Copy link
Author

~2024.7.10

Agenda

Notes

  • Status of PMA
    • Sick today
    • Tomorrow and Friday, expand the demo to include lots more data
    • Issues with the PMA closing when Ares does, reading from event log
    • Demo should include many ~1G datums stored, verifying SHA-256 of each
    • Discussion of using the PMA with another language like Lua as a demonstration, shouldn't take too long
      • Fork PHK Malloc to swap out Lua's memory allocator with our persistent one
      • @eamsden this is probably a distraction and should be done post-Ares
  • Status of Nouncode
    • @eamsden nothing on my end due to sick children and july 4th
    • No other progress made
    • @matthew-levan and @eamsden to meet on Friday to pick this back up
  • Status of SKA punchlist

@jalehman
Copy link
Author

jalehman commented Jul 24, 2024

~2024.7.24

Agenda

Notes

  • Status of nouncode
    • @eamsden dealing with a family issue
    • Rust side is waiting for architectural design decisions driven by the hoon side
    • Reviewed Matt's Rust yesterday and discussed
    • @joemfb is rewriting the linearizer to make nouncode go faster by using basic block arrays
      • many of the helper functions from prior linearizer need to be rewritten for new instruction output format, joe did this
    • @matthew-levan thinks nouncode should be executable by mid-august, in testing
    • @belisarius222 what's the hard part about this? I thought it was fairly straightforward, sounds like it's must more involved for some reason
      • @joemfb reasons
      • all the code is written and ported, wiring it all up is difficult and requires some help from @eamsden
  • Status of PMA @barter-simsum
    • Demo code written for alloc, barter-simsum/pma-demo branch
    • Found a bug doing a bunch of syncing
    • Needs to be fixed before demonstrating fault tolerance
    • Demo allocs 100k-sized chunks of random data, uses writeahead log to validate that chunks are valid
    • Then: allocate 1TB of random data, BR Rip of shrek 2, demonstrate that they can be read and validation of data
      • Tried this, found bug and fixed it, can hopefully record demo of large storage
    • Will conduct a review with @matthew-levan of the bugfixes
    • Also giving a tour of the codebase to @pkova later today
    • Also writing a USTJ article for @sigilante due by early August

@jalehman
Copy link
Author

~2024.7.31

Agenda

Notes

  • Some low-level discussion between @joemfb and @eamsden on nouncode approaches
  • @matthew-levan working on IR interpreter
    • more work needed on pretty printing or test harness for debugability
  • PMA
    • David to update the draft PR with specifics about what's being fixed
    • The need to zero when allocating is "weird" and should be looked into
    • <live review of David's draft PMA PR>

@jalehman
Copy link
Author

jalehman commented Aug 7, 2024

~2024.8.7

Agenda

Notes

  • Nouncode
    • @joemfb: Got basic-block coalescing and some constant propagation working last night. Will help Matt debug the hoon interpreter this afternoon. He’s adding stack traces and jets
    • @matthew-levan lots of crashes in Rust interpreter, hangs on booting slim pill
      • Implementing nouncode interpreter on the Hoon side, which lets us see if it works against Vere
      • @joemfb wrote C code that outputs instructions into a file for inspection, has identified instructions that are "flipped"
    • @belisarius222 joe building up a suite of expressions to run without having to boot the entire thing, which can be hand-inspected for correctness
    • @matthew-levan we should use the repo that Neal set up for benchmarks
  • PMA
    • @barter-simsum working on integrating w/ another memory allocator
      • @belisarius222 rationale:
        • if you've got an interpreted language, it's common to spin up a database to handle persistence, which introduces impedance mismatches
        • for example, mongodb — low friction sicne you can just stuff things in it, but still inefficient
        • what you want is to just take a snapshot of program state
        • with little work you get a persistent execution environment for mainstream languages. it's a cool thing to show off about what we're doing. can actually be useful for webservers, ipython notebooks, etc.
        • @eamsden is there a plan at this juncture to use this interface in Ares?
          • @belisarius222 No, it's a demo. Shows what you can do with the PMA in other environments.
        • @barter-simsum integrating PHK Malloc
          • @eamsden find an allocator that uses mmap instead of <sbreak?>
          • @barter-simsum to look for a different allocator, doesn't need to be the fastest
    • @eamsden status of PR? @barter-simsum still needs to update it (zero when allocating), has made progress on the issue
    • @barter-simsum also continuing work on USTJ Vol. 2 piece on PMA, will reach out for review & feedback

@belisarius222
Copy link
Contributor

~2024.8.14

Agenda

Notes

  • Nouncode
    • @matthew-levan is reading and understanding the SKA code. There are two main files, one for SKA and one for linearization and registerization. He is working on gaining mastery over the SKA file, specifically the lowering to the NOMM representation. He knows the opcodes but wants to understand the sock, provenance tracking, etc. @matthew-levan will start reading the linearizer code today.
    • Last week they were able to boot a baby pill, but with a toddler pill and slim pills, they were seeing some "used before initialized" errors about registers. This was happening because "need"s, i.e. data dependencies, were not propagating properly. @joemfb has modified the need data structure so it doesn't need to put every layer of a tree into a register.
    • Joe: there are multiple problems that have intersected here that are difficult to disentangle:
      • at some level, this whole effort is an optimization problem, but doing that while preserving correctness and stack traces is hard
      • stack traces don't work at all right now. In Nock, anything can crash anywhere, and to get a stack trace that localizes the crash. This is at odds with registerization, which decomposes the subject into smaller pieces. The decomposition can crash, which needs to be reconciled somehow with the rest of the system.
      • Yesterday Joe was going through and rewriting the various operations on needs now that he changed the data structures. This includes splitting, joining, and intersecting need data structures (exactly what the requirements are depends on the details of the stack trace system). He wrote most of them but wasn't able to finish because without answers to the stack trace questions, it was too hard to come up with sane answers to what the specification should be. Joe will meet with @eamsden this evening to discuss crash handling. He also wrote up an outline of a proposal, which he will publish.
      • For the subset of operations he was able to rewrite, he found bugs in the old implementation. He fixed the specific error he and @matthew-levan had encountered using the old need representation, and he was able to run the toddler pill in the dojo in the Hoon nouncode interpreter.
      • Joe has held off work on the new representation until the stack trace questions are answered. If we limit ourselves to code that we don't expect to crash, we can continue limping along with this work, but the crash handling semantics are not yet clear enough because the poison register values can propagate arbitrarily far.
      • A good intuition pump for this challenge is to imagine trying to get jets to report accurate stack traces. @joemfb and others have decided in the past that that would be absurdly hard and not worth it. One option would be to try out some other stack trace discipline, but if people can't attribute a crash to a specific spot in their source code, the system would be too hard to debug.
      • Joe had originally proposed the register poisoning system, where each NockStack frame stores a bitmap of the registers indicating which are poisoned. This is how it worked originally.
      • We can reduce the set of possible crashes that can cause poisoning in a couple ways. You can always crash immediately if you increment a cell or !!, which is the [0 0] formula. All other things that crash can be turned into one of those two errors, so all other validity constraints get turned into those. This leaves us, in this representation, with 'head' and 'tail' instructions (whose crashes need to be relocatable) and non-loobean conditionals (a Nock 6 where the first formula produces something other than 0 or 1). The reason this affects lots of other code is that this system tries hard to optimize branches to remove explicit Nock 6's.
      • We emit poison checks carefully at different places, giving us semantic soup where any register could be poisoned, effectively putting the whole execution into the Maybe monad, where any register could be poisoned and we'd have to emit a stack trace. Another analogy would be Vere but where every noun is a u3_weak. This also makes it difficult to reason about which piece of code should be checking for poisoning.
      • The problem is about observability: locating the crash is important if the location can be observed -- @joemfb would locate poison checking, or crash relocation, under operations that push onto the %mean stack: %spot hints, %mean, %hand, and %lose hints. This would report an error as occurring under some specific hint, rather than more granularly.
      • The next thing that follows from that is that the register needs need to track whether some axis is already checked in subsequent code. This would involve merging needs and checks from chronologically later computations into earlier computations. Each of those checks would be localized under one stack-trace hint, so that if it crashes, the crash will be reported using that hint. If there are no stack-trace hints around a piece of code, then crashes can be relocated arbitrarily.
  • PMA
    • @barter-simsum been working on backing the musl libc malloc() with the PMA. He is using their more performant next-gen malloc implementation, which they have merged into their codebase. It mostly avoids calling sbrk() and brk(), and instead resolves into mmap() calls by maintaining its own brk()-like state. It does call brk() on initialization though, to get the current program break, and stores that in an internal context object. It uses that to offset its own internal break, which uses mmap(). This design might be for safety reasons, to avoid perceived bugginess in some sbrk() implementations.
    • He is planning to modify the musl libc allocator to store its meta-pages inside the PMA instead of inside an sbrk()-allocated region. The first version will hard-code the size of the PMA allocation for those meta-pages, but if that doesn't end up being flexible enough, something will need to be done to emulate brk() and sbrk() with the PMA, or to translate the allocator's meta-page data structures into something that fits more readily into the PMA - it ideally wouldn't have to keep all the meta-pages in a single contiguous block, for example. Upon further inspection during the meeting, it looks likely that modifying the tracking of the meta-pages to not require contiguous allocation will be doable.
    • Since the PMA maintains contiguous virtual memory, it wouldn't be too hard to implement a brk() / sbrk() API on top of it. @barter-simsum hopes to avoid this complexity though.
    • @barter-simsum is working on finishing the USTJ article about the PMA first, which he hopes to finish today, before going back to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meeting notes Notes from project meetings
Projects
None yet
Development

No branches or pull requests

3 participants