Types, control-flow graph, and IR compiler #14

Jakobeha · 2024-06-06T22:05:49Z

No description provided.

@wip init package, CFG, BB, Node, Value, ValueType, Instr @wip refine @wip add instructions list initial value-type writeup @wip CFG, instrs, ValueType @wip draft RType implementation @wip drafting Stmts, Jumps, ... @wip more work... @wip minor fixes to the types and replace function @wip more BB and CFG work @wip try putting inheritance in types it's cleaner to the user (less parallel representation), but a lot urliger to implement...I probably need to see if there's a better way to implement without as much duplication, or if the lack of parallel representation is worth it @wip new `RType` design `RType` is a union of `RSexpType`s, which are for particular types and special values (currently functions, primitive vectors, the missing value, and everything else) @wip redo the type system @wip make it compile + remove most GNU-R bytecodes @wip type system improvements @wip arbitrary providers + bugfixes TODO add providers for function, primitive vector, and generic value types which aren't exact, and fix the tests @wip bugfixes @wip bugfixes main issues is that jqwik gives stack overflows trying to shrink the generated results. I still need to test non-trivial cases and ideally would like shrinking, but if I can't figure out why it doesn't work I'll have to disable it. @wip fixed jqwik generation, the issue now seems to be with function types @wip @wip dominator tree @wip simplify RType arbitrary @wip update notes @wip redo function types much better (untested) includes re-implementing `Rf_matchArgs_NR` (`match.args`?) @wip add `desc`s to instructions and fix to build comment code + test fixes to get this to build @wip fix function types and tests @wip really fix function types and tests Property tests run in OK time (<1min) and haven't gotten a failure yet. @wip document `RType` in notes How it works and explains some decisions. It may change a lot from here though. @wip various fixups @wip simplify RType based on feedback - `RType` is no longer a union, just has one `RValueType`. `RValueType`'s name is more accurate. - The missing type is better represented; it's orthogonal like `RPromiseType`, except that `isMissing = YES` implies `value = null`. This way, we can still represent "known type OR missing". - No more numeric primitive vector. I kept "numeric or logical" in case because binary operators support all of them, but not the non-numeric string and raw (comparison operators don't care). - Potential bugfixes

@wip CFG, BB, and Node... @wip CFG, BB, and Node (particularly BB)... @wip BB#inlineAt + bugfixes @wip starting CFGEdit @wip CFGEdit @wip remove useless CFGCommand and CFGAction @wip refactor @wip try to refactor into something sensible @wip `@TypeIs`, serialization and deserialization @wip serialize and deserialize somewhat from PIR @wip add more PIR instructions A lot of TODO design decisions, because idk how similar this will be to PIR @wip add remaining PIR instructions Still TODO how similar this will be to PIR, also a lot of unimplemented computeType and computeEffects, and maybe some unresolved compile-time errors

@wip IntelliJ decided to update its settings @wip update to Java 22, update dependencies

@wip basic parser and printer @wip parser contexts, improve ParseMethod coherence, and add parser builtin TODO same for printer, then remove ClassGraph @wip parser and printer API @wip optimize imports

@wip start CFG parser and printer with the new API @wip draft closure and closure version @wip use better terminology idk if `Scanner` could be considered a lexer, but it's similar to `java.util.Scanner`. @wip more progress on parsing and printing CFGs @wip implement CFG parser and printer, enough to start writing tests. @wip wrote a test and started fixing bugs @wip begin writing a parser and printer for CFGs and BBs which is not PIR @wip improve typeclass map, parse exceptions, and CFG tests @wip "default" CFG parser and printer + bugfixes @wip parser and printer bugfixes @wip parsing and printing symbol/language parsing and printing + call instruction printing @wip parsing and printing - prints something reasonable The tests fail because right now I'm using IntelliJ's "click to see difference". There's a lot of lost information from the original, and stub data. There are also almost definitely a couple things which are being deserialized from the original or serialized into the reprint incorrectly. But it's only testing the CFG/parser/printer (so some stubs are OK) and it "mostly" works. Instructions from the original one-to-one map to those in the reprint, they line up too (BFS order is correct). Next will start testing CFG recording, writing more tests in general, and checking `mvn verify`. @wip most successfully parse and print, and the rest are infeasible for various reasons Now need to figure out edits, also other tests and stuff @wip small further improvements

@wip fix `scanToEndOfLine` bug @wip fix CFGEdit alised mutation bug @wip fix CFGEdit not storing NodeIds in InstrData and StmtData (+ another case) TODO fix global node IDs @wip progress towards ensuring global nodes can be recovered from their IDs (for CFGEdit) @wip fixed global nodes, testPirObserverCanRecreate passes 100%, except something fails to parse (different problem) @wip fixed small scanner bug testObserverCanRecreate now passes 100% @wip fix inverse, TODO true idempotency @wip fix idempotency and tests all CFG observer tests pass now @wip improve testPirIsParseableAndPrintableWithoutError the goal is to minimize failures and then simply ignore them, so we have a regression test to check that currently parseable PIR data stays parseable

…ntableWithoutError RValue and Env merged because sometimes they can't be statically told apart Also explicitly add `environment` `CallSafeBuiltin` The PIR parser/printer is a mess with many TODOs inserted throughout the code, so will rewrite them and maybe remove some functionality (causing more PIR code to fail to parse) when working on the next step: the R bytecode to IR compiler

to include that it includes the BC compiler

TODO - Fix CFG-edit bijectivity by adding phi nodes to the `InsertJump` edit - Fix other IR issues - Get R session to run on macOS and in the Github container - Refactor? - Start bytecode compiler - ...

# Summary - Add `BatchSubst` and `DefUseAnalysis`. - Allow mutating instructions and phis directly, not via a method on the basic block. - Clarify method names and docs, add new helpers. - Fix phi nodes, at least better than before. In particular, phis' inputs must exactly match the block's predecessors on creation, and are automatically added and removed when predecessors change; stubs are added for new predecessors, and one changes the phi input's node via `setInput`. - Fix some edits not being recorded as `CFGEdits`, or not being bijective. - Change PIR parse/print tests and improve PIR parsing/printing, so that all of them are final PIR (valid CFGs that pass `verify`; `PrintPirAfterOpt` gives CFGs with single-input phi nodes). All current tests pass except some `CompilerTest`s (of course not everything is tested) # More details (specific commits) @wip cleanup `ir` TODOs (features, bugfixes, and refactoring) @wip `BatchSubst`, `DefUseAnalysis`, properly record `InstrOrPhi#replace`, and add labels to compound operations (part of "cleanup `ir` TODOs"; features, bugfixes, and refactoring) @wip refactor to store `BB` in `ReplaceInArgs` edit @wip fix (maybe) phis @wip refactor instr mutation and substitution so it doesn't require BB This makes the API cleaner, since needing the BB seems "unnecessary" and I really doubt it helps time complexity. Also fix some bugs with predecessors/incoming BBs/jump targets not being updated properly. Need to fix DefUseAnalysis not catching all definitions... @wip phi fixes and add test to verify CFG fixed phi node and verify issue + other small improvements fixed verification and PIR parsing/printing replaced PIR tests with ones that are all final PIR, so that we can check verification works. This also caused new PIR-parse/print failures, most of which were because of weird PIR prints that had to be special-cased (not useful), a couple actual bugs in the parsing.

+ explain why the remaining 2 tests are still disabled

18 compiler tests fail because they produce different bytecode, no crashes and other tests pass fix rebase onto main

- Refactored the CFG api a bit more, in particular `builder` @wip continue compiler work (phi stack) and refactor CFG api (always compute phi id from input nodes, improve id syntax and API) - `name` no longer includes disambiguator (more consistent) - `RenameInstr` and `MutateInstrArgs` have been separated TODO: need to make phi IDs remain the same doing forward/reverse edits, and debug more now-failing CFG tests. @wip make edits store ids, cleanup edit API @wip fix and refactor node IDs, symbol parsing, and other things @wip finish fixing node IDs change SEXP builtins to use `BuiltinId`s instead of `String` names improve IR API and add some instructions compile trivial and some less-trivial bytecode instructions also figure out the issues and challenges in implementing the GNUR Bc->IR compile @wip bc->IR compiler - improve bc->IR boilerplate API - compile a few more bytecode instructions, including for loops - fixes idk how to compile complex assignment and dispatch functions... @wip fix CFG tests (bc->IR compiler)

add `VERBOSE` logging to tests because I can't see the GitHub actions output, it also makes tests slightly slower *maybe* fix LatticeTest rare failure expose environment variable for GNU-R binary fast fail compiler tests if the version is wrong revert GitHub actions to use the correct GNU-R version for tests

- (frame states and promises) + initial tests @wip draft implementation of bc->PIR-IR some things are probably not correct though, also a lot of specialized GNU-R bytecodes get converted into CallBuiltin because we don't have PIR instructions with the same specialization some things are also still not implemented (e.g. `MakeClosure`) @wip fix bugs and add documentation to the draft implementation @wip closure and closure version overhaul and begin their compiler also brainstorm high-level (how compilation/evaluation will work) @wip closure and closure version compiler + created `Module` WIP: - How promises will be compiled (and where they're needed). - Add new PIR instructions to implement missing functionality required to compile some CFG bytecodes. - (Maybe longer term) try to implement PushContext and PopContext because unless I'm mistaken they are created by GNU-R wherever there's a `next` or `break`, whereas RIR only needs them for niche complex cases. @wip improve `CFGTests` Report test differently if we failed to parse, but in an acceptable way, so it will be apparent (unfortunately only to the human) if a large amount of tests are failing this way (which is not acceptable). @wip further improve `CFGPirTests` and further fix `CFGEdit` @wip improve `CFGCompiler` - create a call stack instead of putting the function on the regular stack and having a call arguments stack (this may have fixed semantics) cleanup `Compiler` warnings + other small refactors @wip finish draft closure compiler (frame-states and promises) + initial tests

Necessary to inspect the tests, and I suspect some are only failing because printing isn't supported (but necessary to inspect even the passing ones to see if the output resembles something actually successful) @wip parse and print closures properly Draft attempt to properly parse and print the inner closures and promises after. @wip parse and print closures properly Parse and print promises. @wip parse and print closures properly Parse and print the inner closures and promises by forwarding the context in CFGs. @wip parse and print SEXPs (draft impl)

+ RDS reader tests don't throw `Exception`

- mainly in parsing and printing - also in BC->IR, implemented complex assignments @wip parse and print SEXP bugfixes @wip parse and print closure bugfixes @wip refactored/fixed parsing and printing inner code objects @wip fixed printing bytecode (code/const index formatting) @wip bugfixes both in the BC->IR compiler and in parsing/printing fixed node IDs parsed and printed outside of the CFG they were defined in. @wip further bugfixes + implemented complex assignment `inlineSlotAssign` prints something. @wip further bugfixes + improve parsing and printing

- Resolve parse and print methods that take a superclass of the context class. - Parse methods can be constructors

+ bugfixes + refactors

143 BC->IR test failures

- Improve disambiguator assignment to be the lowest possible - Improve invalid node disambiguator assignment to be the lowest possible - Fix printing non-simple scalars - Fix documentation: "dominates nothing" -> "dominates itself" - Make closure and promise bytecode printing consistent - Fix anonymous ID assignment

- Wrong index in extract - `break` in for - `return` in for

- allow node which is in an instruction's arguments multiple times to be replaced multiple times

`andThem` => `andThen`

+ allow setting static environment parents after initialization (necessary to implement this) Every BC->IR test passes except those with `switch` instructions. But is the IR produced correct?

+ bugfixes

All tests "pass" except there are unset phi inputs, so I have to figure out why...

now all BC->IR tests pass, except we don't support dots, and haven't tested the code. Also, I need to fix `CallBuiltin` so that there are "safe" and "dispatch" variants, and the dispatch variants (at least in some cases) `eval` the AST instead of using arguments.

+ refactor test class hierarchy in general

- Should pass `mvn verify` and therefore CI. - had to slightly refactor `PirId.GlobalLogical` constructor due to a bug in the parser.

`Map...View`s are practically all duplicate and very straightforward. We could technically refactor to save a few lines by converting field references to protected instance methods, but it seems unnecessary

Jakobeha added 17 commits June 3, 2024 19:32

update to Java 22, update dependencies

7a0c027

@wip IntelliJ decided to update its settings @wip update to Java 22, update dependencies

add parseprint

a58c0dd

@wip basic parser and printer @wip parser contexts, improve ParseMethod coherence, and add parser builtin TODO same for printer, then remove ClassGraph @wip parser and printer API @wip optimize imports

fix various IntelliJ/spotless warnings

af89a0d

add compiler package

c0a2105

change bc package-info

5884ca9

to include that it includes the BC compiler

fixed bc-compiler rebase issues

ef06197

TODO - Fix CFG-edit bijectivity by adding phi nodes to the `InsertJump` edit - Fix other IR issues - Get R session to run on macOS and in the Github container - Refactor? - Start bytecode compiler - ...

re-enable CompilerTests which were previously failing

b03161c

+ explain why the remaining 2 tests are still disabled

add CFG tests to git

7590d25

skip some CFG tests when FAST_TESTS is set

308e86a

fix commit hook staging entire partial commit

e390a3a

update Java and R versions on GitHub actions

4d19dd4

Jakobeha force-pushed the bc2ir branch 2 times, most recently from a296815 to 8cd7100 Compare June 18, 2024 11:14

Jakobeha added 11 commits June 25, 2024 19:14

fix rebase

c6b4d8b

18 compiler tests fail because they produce different bytecode, no crashes and other tests pass fix rebase onto main

improve errors when starting GNU-R

55cf8a9

add raw SEXPs

fdb2298

fix IntelliJ warnings

d3ca1a5

+ RDS reader tests don't throw `Exception`

refactor parseprint

5301615

- Resolve parse and print methods that take a superclass of the context class. - Parse methods can be constructors

test re-parsing and re-printing closures

18dc586

+ bugfixes + refactors

Jakobeha added 24 commits June 25, 2024 19:26

fixed parsing and printing null record elements

a8b2a81

fixed compiling instructions after returns

01f3fd1

don't report unsupported bytecode tests as failures

8aa0f2a

fix for-loop, fix/improve cleanup, and more

8165a6b

143 BC->IR test failures

compile for loop fixes

cf8671f

- Wrong index in extract - `break` in for - `return` in for

fix parsing escaped unicode

1d36318

fix printing escaped unicode in names (R symbols)

7c4fb55

fix replaceInArgs

8ab6311

- allow node which is in an instruction's arguments multiple times to be replaced multiple times

don't print base env parent, because it's always empty

6bec490

fix small typo

84549c4

`andThem` => `andThen`

fix parsing and printing null

86fa5a3

fix parsing environments referenced in ancestors

94a1db5

+ allow setting static environment parents after initialization (necessary to implement this) Every BC->IR test passes except those with `switch` instructions. But is the IR produced correct?

implement switch

0a2df2a

+ bugfixes

fix more bugs

92c0bb6

All tests "pass" except there are unset phi inputs, so I have to figure out why...

fail verification on unset phi inputs

34bba8c

fix CFGVerify incorrect use-before-def on auxiliary node

e0e6eb3

add missing IR, fix builtin calls, TryDispatchBuiltin

508f239

add ir2c package-info

1c52c70

fix CFGPirSerialize for changed LdVar

b61a270

dedup bc-compiler and closure-IR-compiler tests

4cc15e6

+ refactor test class hierarchy in general

resolve all pmd violations

77e5c00

- Should pass `mvn verify` and therefore CI. - had to slightly refactor `PirId.GlobalLogical` constructor due to a bug in the parser.

update pmd

ce4d572

Jakobeha force-pushed the bc2ir branch from eb390aa to ce4d572 Compare June 25, 2024 23:35

Jakobeha marked this pull request as ready for review June 25, 2024 23:35

Jakobeha added 2 commits June 25, 2024 19:49

suppress unnecessary CPD violation

6c8c808

`Map...View`s are practically all duplicate and very straightforward. We could technically refactor to save a few lines by converting field references to protected instance methods, but it seems unnecessary

fix verify + add some methods

90de942

Jakobeha closed this Jun 28, 2024

Jakobeha deleted the bc2ir branch June 28, 2024 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Types, control-flow graph, and IR compiler #14

Types, control-flow graph, and IR compiler #14

Jakobeha commented Jun 6, 2024

Types, control-flow graph, and IR compiler #14

Types, control-flow graph, and IR compiler #14

Conversation

Jakobeha commented Jun 6, 2024