Skip to content

Commit

Permalink
improve debug instructions (#25)
Browse files Browse the repository at this point in the history
  • Loading branch information
ZuseZ4 authored Aug 22, 2024
1 parent 1742861 commit e27523e
Show file tree
Hide file tree
Showing 4 changed files with 167 additions and 83 deletions.
89 changes: 6 additions & 83 deletions src/Debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,87 +11,10 @@ Please create an issue with such a reproducer, it will likely be easy to fix!
For the unexpected case, that you produce an ICE in our frontend that
is harder to minimize, please consider using [icemelter](https://github.com/langston-barrett/icemelter).

### Backend crashes
If after a compilation failure you are greeted by a large amount of LLVM-IR code,
then our Enzyme backend likely failed to compile your code.
These cases are harder to debug, so your help is highly appreciated.
Please also keep in mind, that release builds are usually much more likely to work at the moment.
### Backend crashes
If you see llvm-ir (a language which might remind you of assembly), then our backend crahed.
You can find instructions on how to create an issue and help us to fix it [on the next page](debug_backend.md).

The final goal here is to reproduce your bug in the Enzyme [compiler explorer](https://enzyme.mit.edu/explorer/),
in order to create a bug report in the [Enzyme core](https://github.com/EnzymeAD/Enzyme/issues) repository.

We have an environment variable called `OPT` to help with this. It will print the whole LLVM-IR module,
along with dummy functions called `enzyme_opt_dbg_helper_<i>`. A potential workflow on Linux could look like:

`RUSTFLAGS="-Z autodiff=OPT" cargo +enzyme build --release &> out.ll`
This also captures a few warnings and info messages above and below your module.
Open out.ll and remove every line above `; ModuleID = <SomeHash>`. Now look at the end of the file and remove everything that's not part of LLVM-IR, i.e. remove errors and warnings. The last line of your LLVM-IR will start with `!<someNumber> = `, i.e.
`!40831 = !{i32 0, i32 1037508, i32 1037538, i32 1037559}` or `!43760 = !DILocation(line: 297, column: 5, scope: !43746)`.
The actual numbers will depend on your code.

`llvm-extract -S --func=f --recursive --rfunc="enzyme_opt_helper_*" out.ll -o mwe.ll`
Please also adjust the name passed with the `--func` flag if your function isn't called `f`. Either look up the correct
llvm-ir name for your function in out.ll, or use the `#[no_mangle]` attribute on the function which you differentiate, in which case
you can pass the original Rust function name to this flag.

Afterwards, you should be able to copy and paste your mwe example into our [compiler explorer](https://enzyme.mit.edu/explorer/) and
hopefully reproduce the same Enzyme error, which you got when you tried to compile your original Rust code.
Please select `LLVM IR` as a language and `opt 20` as your compiler and replace the LLVM-IR example with your final mwe.ll content.

You will quickly note that even small Rust function can generate large llvm-ir reproducer. Please try to get your llvm-ir function below
100 lines, by reducing the Rust function to be differentiated as far as possible. This will significantly speed up the bug fixing process.
Please also try to post both, the compiler-explorer link with your llvm-ir reproducer, as well as a self-contained Rust reproducer.

There are a few solutions to help you with minimizing the Rust reproducer.
This is probably the most simple automated approach:
[cargo-minimize](https://github.com/Nilstrieb/cargo-minimize)

Otherwise we have various alternatives, including
[treereduce](https://github.com/langston-barrett/treereduce),
[halfempty](https://github.com/googleprojectzero/halfempty), or
[picireny](https://github.com/renatahodovan/picireny)

Potentially also
[creduce](https://github.com/csmith-project/creduce)

### Supported RUSTFLAGS
To support you while debugging, we have added support for an experimental `-Z autodiff` flag to `RUSTFLAGS`,
which allow changing the behaviour of Enzyme, without recompiling rustc.
We currently support the following values for `autodiff`:
```bash
PrintTA // Print TypeAnalysis information
PrintAA // Print ActivityAnalysis information
PrintPerf // Print AD related Performance warnings
Print // Print all of the above
PrintModBefore // Print the whole LLVM-IR module before running opts
PrintModAfterOpts // Print the whole LLVM-IR module after running opts, before AD
PrintModAfterEnzyme // Print the whole LLVM-IR module after running opts and AD
LooseTypes // Risk incorect derivatives instead of aborting when missing Type Info
OPT // Most Important debug helper: Print a Module that can run with llvm-opt + enzyme
```

For performance experiments and benchmarking we also support
```
NoModOptAfter // We won't optimize the whole LLVM-IR Module after AD
EnableFncOpt // We will optimize each derivative function generated individually
NoVecUnroll // Disables vectorization and loop unrolling
NoSafetyChecks // Disables Enzyme specific safety checks
RuntimeActivity // Enables the runtime activity feature from Enzyme
Inline // Instructs Enzyme to apply additional inlining beyond LLVM's default
AltPipeline // Don't optimize IR before AD, but optimize the whole module twice after AD
```

You can combine multiple `autodiff` values using a comma as separator:
```bash
RUSTFLAGS="-Z autodiff=LooseTypes,NoVecUnroll" cargo +enzyme build
```


The normal compilation pipeline of Rust-Enzyme is
1) Run your selected compilation pipeline. If you selected a release build, we will disable vectorization and loop unrolling.
2) Differentiate your functions.
3) Run your selected compilation pipeline again on the whole module. This time we do not disable vectorization or loop unrolling.

The alt pipeline will not run opts before AD, but 2x after AD - the first time without vectorization or loop unrolling, the second time with.

The two flags above allow you to adjust this default behaviour.
### Debuging and Profiling
Rust-AD supports passing an `autodiff` flag to `RUSTFLAGS`, which supports changing the behaviour of Enzyme in various ways.
Documentation is availabile [here](debug_flags.md).
2 changes: 2 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
- [Future Work](./future_work.md)
- [History and ecosystem](./ecosystem.md)
- [How to Debug](./Debugging.md)
- [Debug the backend](./debug_backend.md)
- [Debug and Profile flags](./debug_flags.md)
# Reference Guide
- [Other Enzyme frontends](./other_Frontends.md)
- [Forward Mode](./fwd.md)
Expand Down
104 changes: 104 additions & 0 deletions src/debug_backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Reporting backend crashes
If after a compilation failure you are greeted by a large amount of LLVM-IR code,
then our Enzyme backend likely failed to compile your code.
These cases are harder to debug, so your help is highly appreciated.
Please also keep in mind, that release builds are usually much more likely to work at the moment.

The final goal here is to reproduce your bug in the Enzyme [compiler explorer](https://enzyme.mit.edu/explorer/),
in order to create a bug report in the [Enzyme core](https://github.com/EnzymeAD/Enzyme/issues) repository.

We have an `autodiff` flag which you can pass to `RUSTFLAGS` to help with this. It will print the whole LLVM-IR module,
along with dummy functions called `enzyme_opt_dbg_helper_<i>`. A potential workflow on Linux could look like:

## 1) Generate an LLVM-IR reproducer
```sh
RUSTFLAGS="-Z autodiff=OPT" cargo +enzyme build --release &> out.ll
```
This also captures a few warnings and info messages above and below your module.
Open out.ll and remove every line above `; ModuleID = <SomeHash>`.
Now look at the end of the file and remove everything that's not part of LLVM-IR, i.e. remove errors and warnings.
The last line of your LLVM-IR should now start with `!<someNumber> = `, i.e.
`!40831 = !{i32 0, i32 1037508, i32 1037538, i32 1037559}` or `!43760 = !DILocation(line: 297, column: 5, scope: !43746)`.
The actual numbers will depend on your code.

## 2) Check your LLVM-IR reproducer
To confirm that you're previous step worked, let's will use LLVM's opt tool.
Find your path to the opt binary, with a path similar to
`<some_dir>/rust/build/<x86/arm/...-target-tripple>/build/bin/opt`.
Also find `LLVMEnzyme-19.<so/dll/dylib>` path, similar to `/rust/build/target-tripple/enzyme/build/Enzyme/LLVMEnzyme-19`.
Once you have both, run the following command:
```sh
path/to/your/opt out.ll -load-pass-plugin=/path/to/your/LLVMEnzyme-19.so -passes="enzyme" -S
```
If your previous step, you will now see the same error as you saw when compiling your Rust code with Cargo.
If you fail to get the same error, please open an issue in the Rust repository. If you succeed, congrats!
The file is still huge, so let's automatically minimize it.

## 3) Minimize your LLVM-IR reproducer
First find your llvm-extract binary, it's in the same folder as your opt binary. Then run:
```sh
path/to/your/llvm-extract -S --func=<your-llvm-fnc-name> --recursive --rfunc="enzyme_opt_helper_*" out.ll -o mwe.ll
```
Please adjust the name passed with the `--func` flag.
You can either apply the `#[no_mangle]` attribute to the function you differentiate,
then you can replace it with the Rust name. Otherwise you will need to look up the mangled function name.
To do that open out.ll and search for `__enzyme_fwddiff` or `__enzyme_autodiff`.
The first string in that function call is the name of your function. Example:
```llvm-ir
define double @enzyme_opt_helper_0(ptr %0, i64 %1, double %2) {
%4 = call double (...) @__enzyme_fwddiff(ptr @_ZN2ad3_f217h3b3b1800bd39fde3E, metadata !"enzyme_const", ptr %0, metadata !"enzyme_const", i64 %1, metadata !"enzyme_dup", double %2, double %2)
ret double %4
}
```
Here, `_ZN2ad3_f217h3b3b1800bd39fde3E` is the correct name. Make sure to not copy the leading `@`.
Redo step 2), but now pass mwe.ll instead of out.ll to mod, to see if your minimized example reproduces your crash.

## 4) (Optional) Minimize your LLVM-IR reproducer further.
After the previous step you should have an `mwe.ll` file with ~5k LoC. Let's try to get it down to 50.
Find your `llvm-reduce` binary next to `opt` and `llvm-extract`.
Copy the first line of your error message, an example could be:
```sh
opt: /home/manuel/prog/rust/src/llvm-project/llvm/lib/IR/Instructions.cpp:686: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(Args.size() == FTy->getNumParams() || (FTy->isVarArg() && Args.size() > FTy->getNumParams())) && "Calling a function with bad signature!"' failed.
```
If you just get a segfault there is no sensible error message and not much to do automatically, so continue to 5).
Otherwise, create a script.sh file containing
```sh
#!/bin/bash
<path/to/your/opt> $1 -load-pass-plugin=/path/to/your/LLVMEnzyme-19.so -passes="enzyme" \
|& grep "/some/path.cpp:686: void llvm::CallInst::init"
```
Experiment a bit with which error message you pass to grep. It should be long enough to make sure that the error is unique.
However, for longer errors including `(` or `)` you will need to escape them correctly which can become annoying. Run
```sh
<path/to/llvm-reduce> --test=script.sh mwe.ll
```
If you see `Input isn't interesting! Verify interesting-ness test`, you got the error message in script.sh wrong,
you need to make sure that grep matches your actuall error.
If all works out, you will see a lot of iterations, ending with a new `reduced.ll` file.
Verify with `opt` that you still get the same error.
## 5) Report your bug.
Afterwards, you should be able to copy and paste your `mwe.ll` (and `reduced.ll`) example into our [compiler explorer](https://enzyme.mit.edu/explorer/).
Select `LLVM IR` as language and `opt 20` as compiler. Replace the field to the right of your compiler with `-passes="enzyme"`, if it is not already set.
Hopefully, you will see once again your now familiar error. Please use the share button to copy links to them.
Please create an issue on [https://github.com/EnzymeAD/Enzyme/issues](github) and share `mwe.ll` and (if you have it) `reduced.ll`, as well as links to the compiler explorer. Please feel free to also add your Rust code or a link to it. With that, hopefully someone from the Enzyme core repository will be able to fix your bug. Once that happened, I will update the Enzyme submodule inside the rust compiler, which should allow you to now differentiate your Rust code. Thanks for helping us to improve Rust-AD.
# Minimize Rust code
Beyond having a minimal LLVM-IR reproducer, it is also helpful to have a minimal Rust reproducer without dependencies,
because it allows us to add it as a testcase to CI, to avoid regressions even after fixing the bug.
There are a few solutions to help you with minimizing the Rust reproducer.
This is probably the most simple automated approach:
[cargo-minimize](https://github.com/Nilstrieb/cargo-minimize)
Otherwise we have various alternatives, including
[treereduce](https://github.com/langston-barrett/treereduce),
[halfempty](https://github.com/googleprojectzero/halfempty), or
[picireny](https://github.com/renatahodovan/picireny)
Potentially also
[creduce](https://github.com/csmith-project/creduce)
55 changes: 55 additions & 0 deletions src/debug_flags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Supported RUSTFLAGS
To support you while debugging, we have added support for an experimental `-Z autodiff` flag to `RUSTFLAGS`,
which allow changing the behaviour of Enzyme, without recompiling rustc.
We currently support the following values for `autodiff`:

### Debug Flags
```bash
PrintTA // Print TypeAnalysis information
PrintAA // Print ActivityAnalysis information
PrintPerf // Print AD related Performance warnings
Print // Print all of the above
PrintModBefore // Print the whole LLVM-IR module before running opts
PrintModAfterOpts // Print the whole LLVM-IR module after running opts, before AD
PrintModAfterEnzyme // Print the whole LLVM-IR module after running opts and AD
LooseTypes // Risk incorect derivatives instead of aborting when missing Type Info
OPT // Most Important debug helper: Print a Module that can run with llvm-opt + enzyme
```

<div class="warning">

`LooseTypes` is often helpful to get rid of Enzyme errors stating
`Can not deduce type of <X>` and to be able to run some code. But please
keep in mind that this flag absolutely has the chance to cause incorrect gradients.
Even worse, the gradients might be correct for certain input values, but not for others.
So please create issues about such bugs and only use this flag temporarily while you wait for your
bug to be fixed.

</div>

### Benchmark flags
For performance experiments and benchmarking we also support
```
NoModOptAfter // We won't optimize the whole LLVM-IR Module after AD
EnableFncOpt // We will optimize each derivative function generated individually
NoVecUnroll // Disables vectorization and loop unrolling
NoSafetyChecks // Disables Enzyme specific safety checks
RuntimeActivity // Enables the runtime activity feature from Enzyme
Inline // Instructs Enzyme to apply additional inlining beyond LLVM's default
AltPipeline // Don't optimize IR before AD, but optimize the whole module twice after AD
```

You can combine multiple `autodiff` values using a comma as separator:
```bash
RUSTFLAGS="-Z autodiff=LooseTypes,NoVecUnroll" cargo +enzyme build
```


The normal compilation pipeline of Rust-Enzyme is
1) Run your selected compilation pipeline. If you selected a release build, we will disable vectorization and loop unrolling.
2) Differentiate your functions.
3) Run your selected compilation pipeline again on the whole module. This time we do not disable vectorization or loop unrolling.

The alt pipeline will not run opts before AD, but 2x after AD - the first time without vectorization or loop unrolling, the second time with.

The two flags above allow you to adjust this default behaviour.

0 comments on commit e27523e

Please sign in to comment.