Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types, control-flow graph, and IR compiler #14

Closed
wants to merge 57 commits into from
Closed

Types, control-flow graph, and IR compiler #14

wants to merge 57 commits into from

Conversation

Jakobeha
Copy link
Contributor

@Jakobeha Jakobeha commented Jun 6, 2024

No description provided.

@wip init package, CFG, BB, Node, Value, ValueType, Instr

@wip refine

@wip add instructions list

initial value-type writeup

@wip CFG, instrs, ValueType

@wip draft RType implementation

@wip drafting Stmts, Jumps, ...

@wip more work...

@wip minor fixes to the types and replace function

@wip more BB and CFG work

@wip try putting inheritance in types

it's cleaner to the user (less parallel representation), but a lot urliger to implement...I probably need to see if there's a better way to implement without as much duplication, or if the lack of parallel representation is worth it

@wip new `RType` design

`RType` is a union of `RSexpType`s, which are for particular types and special values (currently functions, primitive vectors, the missing value, and everything else)

@wip redo the type system

@wip make it compile + remove most GNU-R bytecodes

@wip type system improvements

@wip arbitrary providers + bugfixes

TODO add providers for function, primitive vector, and generic value types which aren't exact, and fix the tests

@wip bugfixes

@wip bugfixes

main issues is that jqwik gives stack overflows trying to shrink the generated results. I still need to test non-trivial cases and ideally would like shrinking, but if I can't figure out why it doesn't work I'll have to disable it.

@wip fixed jqwik generation, the issue now seems to be with function types

@wip

@wip dominator tree

@wip simplify RType arbitrary

@wip update notes

@wip redo function types much better (untested)

includes re-implementing `Rf_matchArgs_NR` (`match.args`?)

@wip add `desc`s to instructions and fix to build

comment code + test fixes to get this to build

@wip fix function types and tests

@wip really fix function types and tests

Property tests run in OK time (<1min) and haven't gotten a failure yet.

@wip document `RType` in notes

How it works and explains some decisions.

It may change a lot from here though.

@wip various fixups

@wip simplify RType based on feedback

- `RType` is no longer a union, just has one `RValueType`. `RValueType`'s name is more accurate.
- The missing type is better represented; it's orthogonal like `RPromiseType`, except that `isMissing = YES` implies `value = null`. This way, we can still represent "known type OR missing".
- No more numeric primitive vector. I kept "numeric or logical" in case because binary operators support all of them, but not the non-numeric string and raw (comparison operators don't care).
- Potential bugfixes
@wip CFG, BB, and Node...

@wip CFG, BB, and Node (particularly BB)...

@wip BB#inlineAt + bugfixes

@wip starting CFGEdit

@wip CFGEdit

@wip remove useless CFGCommand and CFGAction

@wip refactor

@wip try to refactor into something sensible

@wip `@TypeIs`, serialization and deserialization

@wip serialize and deserialize somewhat from PIR

@wip add more PIR instructions

A lot of TODO design decisions, because idk how similar this will be to PIR

@wip add remaining PIR instructions

Still TODO how similar this will be to PIR, also a lot of unimplemented computeType and computeEffects, and maybe some unresolved compile-time errors
@wip IntelliJ decided to update its settings

@wip update to Java 22, update dependencies
@wip basic parser and printer

@wip parser contexts, improve ParseMethod coherence, and add parser builtin

TODO same for printer, then remove ClassGraph

@wip parser and printer API

@wip optimize imports
@wip start CFG parser and printer with the new API

@wip draft closure and closure version

@wip use better terminology

idk if `Scanner` could be considered a lexer, but it's similar to `java.util.Scanner`.

@wip more progress on parsing and printing CFGs

@wip implement CFG parser and printer, enough to start writing tests.

@wip wrote a test and started fixing bugs

@wip begin writing a parser and printer for CFGs and BBs which is not PIR

@wip improve typeclass map, parse exceptions, and CFG tests

@wip "default" CFG parser and printer + bugfixes

@wip parser and printer bugfixes

@wip parsing and printing

symbol/language parsing and printing + call instruction printing

@wip parsing and printing - prints something reasonable

The tests fail because right now I'm using IntelliJ's "click to see difference".

There's a lot of lost information from the original, and stub data. There are also almost definitely a couple things which are being deserialized from the original or serialized into the reprint incorrectly.

But it's only testing the CFG/parser/printer (so some stubs are OK) and it "mostly" works. Instructions from the original one-to-one map to those in the reprint, they line up too (BFS order is correct).

Next will start testing CFG recording, writing more tests in general, and checking `mvn verify`.

@wip most successfully parse and print, and the rest are infeasible for various reasons

Now need to figure out edits, also other tests and stuff

@wip small further improvements
@wip fix `scanToEndOfLine` bug

@wip fix CFGEdit alised mutation bug

@wip fix CFGEdit not storing NodeIds in InstrData and StmtData (+ another case)

TODO fix global node IDs

@wip progress towards ensuring global nodes can be recovered from their IDs (for CFGEdit)

@wip fixed global nodes,

testPirObserverCanRecreate passes 100%, except something fails to parse (different problem)

@wip fixed small scanner bug

testObserverCanRecreate now passes 100%

@wip fix inverse, TODO true idempotency

@wip fix idempotency and tests

all CFG observer tests pass now

@wip improve testPirIsParseableAndPrintableWithoutError

the goal is to minimize failures and then simply ignore them, so we have a regression test to check that currently parseable PIR data stays parseable
…ntableWithoutError

RValue and Env merged because sometimes they can't be statically told apart

Also explicitly add `environment` `CallSafeBuiltin`

The PIR parser/printer is a mess with many TODOs inserted throughout the code, so will rewrite them and maybe remove some functionality (causing more PIR code to fail to parse) when working on the next step: the R bytecode to IR compiler
to include that it includes the BC compiler
TODO
- Fix CFG-edit bijectivity by adding phi nodes to the `InsertJump` edit
- Fix other IR issues
- Get R session to run on macOS and in the Github container
- Refactor?
- Start bytecode compiler
- ...
# Summary

- Add `BatchSubst` and `DefUseAnalysis`.
- Allow mutating instructions and phis directly, not via a method on the basic block.
- Clarify method names and docs, add new helpers.
- Fix phi nodes, at least better than before. In particular, phis' inputs must exactly match the block's predecessors on creation, and are automatically added and removed when predecessors change; stubs are added for new predecessors, and one changes the phi input's node via `setInput`.
- Fix some edits not being recorded as `CFGEdits`, or not being bijective.
- Change PIR parse/print tests and improve PIR parsing/printing, so that all of them are final PIR (valid CFGs that pass `verify`; `PrintPirAfterOpt` gives CFGs with single-input phi nodes).

All current tests pass except some `CompilerTest`s (of course not everything is tested)

# More details (specific commits)

@wip cleanup `ir` TODOs (features, bugfixes, and refactoring)

@wip `BatchSubst`, `DefUseAnalysis`, properly record `InstrOrPhi#replace`, and add labels to compound operations (part of "cleanup `ir` TODOs"; features, bugfixes, and refactoring)

@wip refactor to store `BB` in `ReplaceInArgs` edit

@wip fix (maybe) phis

@wip refactor instr mutation and substitution so it doesn't require BB

This makes the API cleaner, since needing the BB seems "unnecessary" and I really doubt it helps time complexity.

Also fix some bugs with predecessors/incoming BBs/jump targets not being updated properly.

Need to fix DefUseAnalysis not catching all definitions...

@wip phi fixes and add test to verify CFG

fixed phi node and verify issue + other small improvements

fixed verification and PIR parsing/printing

replaced PIR tests with ones that are all final PIR, so that we can check verification works. This also caused new PIR-parse/print failures, most of which were because of weird PIR prints that had to be special-cased (not useful), a couple actual bugs in the parsing.
+ explain why the remaining 2 tests are still disabled
@Jakobeha Jakobeha force-pushed the bc2ir branch 2 times, most recently from a296815 to 8cd7100 Compare June 18, 2024 11:14
18 compiler tests fail because they produce different bytecode, no crashes and other tests pass

fix rebase onto main
- Refactored the CFG api a bit more, in particular `builder`

@wip continue compiler work (phi stack) and refactor CFG api (always compute phi id from input nodes, improve id syntax and API)

- `name` no longer includes disambiguator (more consistent)
- `RenameInstr` and `MutateInstrArgs` have been separated

TODO: need to make phi IDs remain the same doing forward/reverse edits, and debug more now-failing CFG tests.

@wip make edits store ids, cleanup edit API

@wip fix and refactor node IDs, symbol parsing, and other things

@wip finish fixing node IDs

change SEXP builtins to use `BuiltinId`s instead of `String` names

improve IR API and add some instructions

compile trivial and some less-trivial bytecode instructions

also figure out the issues and challenges in implementing the GNUR Bc->IR compile

@wip bc->IR compiler

- improve bc->IR boilerplate API

- compile a few more bytecode instructions, including for loops

- fixes

idk how to compile complex assignment and dispatch functions...

@wip fix CFG tests (bc->IR compiler)
add `VERBOSE` logging to tests

because I can't see the GitHub actions output, it also makes tests slightly slower

*maybe* fix LatticeTest rare failure

expose environment variable for GNU-R binary

fast fail compiler tests if the version is wrong

revert GitHub actions to use the correct GNU-R version for tests
- (frame states and promises)
+ initial tests

@wip draft implementation of bc->PIR-IR

some things are probably not correct though, also a lot of specialized GNU-R bytecodes get converted into CallBuiltin because we don't have PIR instructions with the same specialization

some things are also still not implemented (e.g. `MakeClosure`)

@wip fix bugs and add documentation to the draft implementation

@wip closure and closure version overhaul and begin their compiler

also brainstorm high-level (how compilation/evaluation will work)

@wip closure and closure version compiler + created `Module`

WIP:

- How promises will be compiled (and where they're needed).
- Add new PIR instructions to implement missing functionality required to compile some CFG bytecodes.
- (Maybe longer term) try to implement PushContext and PopContext because unless I'm mistaken they are created by GNU-R wherever there's a `next` or `break`, whereas RIR only needs them for niche complex cases.

@wip improve `CFGTests`

Report test differently if we failed to parse, but in an acceptable way, so it will be apparent (unfortunately only to the human) if a large amount of tests are failing this way (which is not acceptable).

@wip further improve `CFGPirTests`

and further fix `CFGEdit`

@wip improve `CFGCompiler`

- create a call stack instead of putting the function on the regular stack and having a call arguments stack (this may have fixed semantics)

cleanup `Compiler` warnings + other small refactors

@wip finish draft closure compiler (frame-states and promises) + initial tests
 Necessary to inspect the tests, and I suspect some are only failing because printing isn't supported (but necessary to inspect even the passing ones to see if the output resembles something actually successful)

@wip parse and print closures properly

Draft attempt to properly parse and print the inner closures and promises after.

@wip parse and print closures properly

Parse and print promises.

@wip parse and print closures properly

Parse and print the inner closures and promises by forwarding the context in CFGs.

@wip parse and print SEXPs (draft impl)
+ RDS reader tests don't throw `Exception`
- mainly in parsing and printing

- also in BC->IR, implemented complex assignments

@wip parse and print SEXP bugfixes

@wip parse and print closure bugfixes

@wip refactored/fixed parsing and printing inner code objects

@wip fixed printing bytecode (code/const index formatting)

@wip bugfixes

both in the BC->IR compiler and in parsing/printing

fixed node IDs parsed and printed outside of the CFG they were defined in.

@wip further bugfixes + implemented complex assignment

`inlineSlotAssign` prints something.

@wip further bugfixes + improve parsing and printing
- Resolve parse and print methods that take a superclass of the context class.
- Parse methods can be constructors
- Improve disambiguator assignment to be the lowest possible
- Improve invalid node disambiguator assignment to be the lowest possible
- Fix printing non-simple scalars
- Fix documentation: "dominates nothing" -> "dominates itself"
- Make closure and promise bytecode printing consistent
- Fix anonymous ID assignment
- Wrong index in extract
- `break` in for
- `return` in for
- allow node which is in an instruction's arguments multiple times to be replaced multiple times
`andThem` => `andThen`
+ allow setting static environment parents after initialization (necessary to implement this)

Every BC->IR test passes except those with `switch` instructions. But is the IR produced correct?
+ bugfixes
All tests "pass" except there are unset phi inputs, so I have to figure out why...
now all BC->IR tests pass, except we don't support dots, and haven't tested the code.

Also, I need to fix `CallBuiltin` so that there are "safe" and "dispatch" variants, and the dispatch variants (at least in some cases) `eval` the AST instead of using arguments.
+ refactor test class hierarchy in general
- Should pass `mvn verify` and therefore CI.
- had to slightly refactor `PirId.GlobalLogical` constructor due to a bug in the parser.
@Jakobeha Jakobeha marked this pull request as ready for review June 25, 2024 23:35
`Map...View`s are practically all duplicate and very straightforward. We could technically refactor to save a few lines by converting field references to protected instance methods, but it seems unnecessary
@Jakobeha Jakobeha closed this Jun 28, 2024
@Jakobeha Jakobeha deleted the bc2ir branch June 28, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant