diff --git a/src/Debugging.md b/src/Debugging.md index 647d7c1..3ea19f7 100644 --- a/src/Debugging.md +++ b/src/Debugging.md @@ -11,87 +11,10 @@ Please create an issue with such a reproducer, it will likely be easy to fix! For the unexpected case, that you produce an ICE in our frontend that is harder to minimize, please consider using [icemelter](https://github.com/langston-barrett/icemelter). -### Backend crashes -If after a compilation failure you are greeted by a large amount of LLVM-IR code, -then our Enzyme backend likely failed to compile your code. -These cases are harder to debug, so your help is highly appreciated. -Please also keep in mind, that release builds are usually much more likely to work at the moment. +### Backend crashes +If you see llvm-ir (a language which might remind you of assembly), then our backend crahed. +You can find instructions on how to create an issue and help us to fix it [on the next page](debug_backend.md). -The final goal here is to reproduce your bug in the Enzyme [compiler explorer](https://enzyme.mit.edu/explorer/), -in order to create a bug report in the [Enzyme core](https://github.com/EnzymeAD/Enzyme/issues) repository. - -We have an environment variable called `OPT` to help with this. It will print the whole LLVM-IR module, -along with dummy functions called `enzyme_opt_dbg_helper_`. A potential workflow on Linux could look like: - -`RUSTFLAGS="-Z autodiff=OPT" cargo +enzyme build --release &> out.ll` -This also captures a few warnings and info messages above and below your module. -Open out.ll and remove every line above `; ModuleID = `. Now look at the end of the file and remove everything that's not part of LLVM-IR, i.e. remove errors and warnings. The last line of your LLVM-IR will start with `! = `, i.e. -`!40831 = !{i32 0, i32 1037508, i32 1037538, i32 1037559}` or `!43760 = !DILocation(line: 297, column: 5, scope: !43746)`. -The actual numbers will depend on your code. - -`llvm-extract -S --func=f --recursive --rfunc="enzyme_opt_helper_*" out.ll -o mwe.ll` -Please also adjust the name passed with the `--func` flag if your function isn't called `f`. Either look up the correct -llvm-ir name for your function in out.ll, or use the `#[no_mangle]` attribute on the function which you differentiate, in which case -you can pass the original Rust function name to this flag. - -Afterwards, you should be able to copy and paste your mwe example into our [compiler explorer](https://enzyme.mit.edu/explorer/) and -hopefully reproduce the same Enzyme error, which you got when you tried to compile your original Rust code. -Please select `LLVM IR` as a language and `opt 20` as your compiler and replace the LLVM-IR example with your final mwe.ll content. - -You will quickly note that even small Rust function can generate large llvm-ir reproducer. Please try to get your llvm-ir function below -100 lines, by reducing the Rust function to be differentiated as far as possible. This will significantly speed up the bug fixing process. -Please also try to post both, the compiler-explorer link with your llvm-ir reproducer, as well as a self-contained Rust reproducer. - -There are a few solutions to help you with minimizing the Rust reproducer. -This is probably the most simple automated approach: -[cargo-minimize](https://github.com/Nilstrieb/cargo-minimize) - -Otherwise we have various alternatives, including -[treereduce](https://github.com/langston-barrett/treereduce), -[halfempty](https://github.com/googleprojectzero/halfempty), or -[picireny](https://github.com/renatahodovan/picireny) - -Potentially also -[creduce](https://github.com/csmith-project/creduce) - -### Supported RUSTFLAGS -To support you while debugging, we have added support for an experimental `-Z autodiff` flag to `RUSTFLAGS`, -which allow changing the behaviour of Enzyme, without recompiling rustc. -We currently support the following values for `autodiff`: -```bash -PrintTA // Print TypeAnalysis information -PrintAA // Print ActivityAnalysis information -PrintPerf // Print AD related Performance warnings -Print // Print all of the above -PrintModBefore // Print the whole LLVM-IR module before running opts -PrintModAfterOpts // Print the whole LLVM-IR module after running opts, before AD -PrintModAfterEnzyme // Print the whole LLVM-IR module after running opts and AD -LooseTypes // Risk incorect derivatives instead of aborting when missing Type Info -OPT // Most Important debug helper: Print a Module that can run with llvm-opt + enzyme -``` - -For performance experiments and benchmarking we also support -``` -NoModOptAfter // We won't optimize the whole LLVM-IR Module after AD -EnableFncOpt // We will optimize each derivative function generated individually -NoVecUnroll // Disables vectorization and loop unrolling -NoSafetyChecks // Disables Enzyme specific safety checks -RuntimeActivity // Enables the runtime activity feature from Enzyme -Inline // Instructs Enzyme to apply additional inlining beyond LLVM's default -AltPipeline // Don't optimize IR before AD, but optimize the whole module twice after AD -``` - -You can combine multiple `autodiff` values using a comma as separator: -```bash -RUSTFLAGS="-Z autodiff=LooseTypes,NoVecUnroll" cargo +enzyme build -``` - - -The normal compilation pipeline of Rust-Enzyme is -1) Run your selected compilation pipeline. If you selected a release build, we will disable vectorization and loop unrolling. -2) Differentiate your functions. -3) Run your selected compilation pipeline again on the whole module. This time we do not disable vectorization or loop unrolling. - -The alt pipeline will not run opts before AD, but 2x after AD - the first time without vectorization or loop unrolling, the second time with. - -The two flags above allow you to adjust this default behaviour. +### Debuging and Profiling +Rust-AD supports passing an `autodiff` flag to `RUSTFLAGS`, which supports changing the behaviour of Enzyme in various ways. +Documentation is availabile [here](debug_flags.md). diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 35e2343..15239a1 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -12,6 +12,8 @@ - [Future Work](./future_work.md) - [History and ecosystem](./ecosystem.md) - [How to Debug](./Debugging.md) + - [Debug the backend](./debug_backend.md) + - [Debug and Profile flags](./debug_flags.md) # Reference Guide - [Other Enzyme frontends](./other_Frontends.md) - [Forward Mode](./fwd.md) diff --git a/src/debug_backend.md b/src/debug_backend.md new file mode 100644 index 0000000..eeb4b43 --- /dev/null +++ b/src/debug_backend.md @@ -0,0 +1,104 @@ +# Reporting backend crashes +If after a compilation failure you are greeted by a large amount of LLVM-IR code, +then our Enzyme backend likely failed to compile your code. +These cases are harder to debug, so your help is highly appreciated. +Please also keep in mind, that release builds are usually much more likely to work at the moment. + +The final goal here is to reproduce your bug in the Enzyme [compiler explorer](https://enzyme.mit.edu/explorer/), +in order to create a bug report in the [Enzyme core](https://github.com/EnzymeAD/Enzyme/issues) repository. + +We have an `autodiff` flag which you can pass to `RUSTFLAGS` to help with this. It will print the whole LLVM-IR module, +along with dummy functions called `enzyme_opt_dbg_helper_`. A potential workflow on Linux could look like: + +## 1) Generate an LLVM-IR reproducer +```sh +RUSTFLAGS="-Z autodiff=OPT" cargo +enzyme build --release &> out.ll +``` +This also captures a few warnings and info messages above and below your module. +Open out.ll and remove every line above `; ModuleID = `. +Now look at the end of the file and remove everything that's not part of LLVM-IR, i.e. remove errors and warnings. +The last line of your LLVM-IR should now start with `! = `, i.e. +`!40831 = !{i32 0, i32 1037508, i32 1037538, i32 1037559}` or `!43760 = !DILocation(line: 297, column: 5, scope: !43746)`. +The actual numbers will depend on your code. + +## 2) Check your LLVM-IR reproducer +To confirm that you're previous step worked, let's will use LLVM's opt tool. +Find your path to the opt binary, with a path similar to +`/rust/build//build/bin/opt`. +Also find `LLVMEnzyme-19.` path, similar to `/rust/build/target-tripple/enzyme/build/Enzyme/LLVMEnzyme-19`. +Once you have both, run the following command: +```sh +path/to/your/opt out.ll -load-pass-plugin=/path/to/your/LLVMEnzyme-19.so -passes="enzyme" -S +``` +If your previous step, you will now see the same error as you saw when compiling your Rust code with Cargo. +If you fail to get the same error, please open an issue in the Rust repository. If you succeed, congrats! +The file is still huge, so let's automatically minimize it. + +## 3) Minimize your LLVM-IR reproducer +First find your llvm-extract binary, it's in the same folder as your opt binary. Then run: +```sh +path/to/your/llvm-extract -S --func= --recursive --rfunc="enzyme_opt_helper_*" out.ll -o mwe.ll +``` +Please adjust the name passed with the `--func` flag. +You can either apply the `#[no_mangle]` attribute to the function you differentiate, +then you can replace it with the Rust name. Otherwise you will need to look up the mangled function name. +To do that open out.ll and search for `__enzyme_fwddiff` or `__enzyme_autodiff`. +The first string in that function call is the name of your function. Example: +```llvm-ir +define double @enzyme_opt_helper_0(ptr %0, i64 %1, double %2) { + %4 = call double (...) @__enzyme_fwddiff(ptr @_ZN2ad3_f217h3b3b1800bd39fde3E, metadata !"enzyme_const", ptr %0, metadata !"enzyme_const", i64 %1, metadata !"enzyme_dup", double %2, double %2) + ret double %4 +} +``` +Here, `_ZN2ad3_f217h3b3b1800bd39fde3E` is the correct name. Make sure to not copy the leading `@`. +Redo step 2), but now pass mwe.ll instead of out.ll to mod, to see if your minimized example reproduces your crash. + +## 4) (Optional) Minimize your LLVM-IR reproducer further. +After the previous step you should have an `mwe.ll` file with ~5k LoC. Let's try to get it down to 50. +Find your `llvm-reduce` binary next to `opt` and `llvm-extract`. +Copy the first line of your error message, an example could be: +```sh +opt: /home/manuel/prog/rust/src/llvm-project/llvm/lib/IR/Instructions.cpp:686: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef, llvm::ArrayRef >, const llvm::Twine&): Assertion `(Args.size() == FTy->getNumParams() || (FTy->isVarArg() && Args.size() > FTy->getNumParams())) && "Calling a function with bad signature!"' failed. +``` +If you just get a segfault there is no sensible error message and not much to do automatically, so continue to 5). +Otherwise, create a script.sh file containing +```sh +#!/bin/bash + $1 -load-pass-plugin=/path/to/your/LLVMEnzyme-19.so -passes="enzyme" \ + |& grep "/some/path.cpp:686: void llvm::CallInst::init" +``` +Experiment a bit with which error message you pass to grep. It should be long enough to make sure that the error is unique. +However, for longer errors including `(` or `)` you will need to escape them correctly which can become annoying. Run +```sh + --test=script.sh mwe.ll +``` +If you see `Input isn't interesting! Verify interesting-ness test`, you got the error message in script.sh wrong, +you need to make sure that grep matches your actuall error. +If all works out, you will see a lot of iterations, ending with a new `reduced.ll` file. +Verify with `opt` that you still get the same error. + +## 5) Report your bug. + +Afterwards, you should be able to copy and paste your `mwe.ll` (and `reduced.ll`) example into our [compiler explorer](https://enzyme.mit.edu/explorer/). +Select `LLVM IR` as language and `opt 20` as compiler. Replace the field to the right of your compiler with `-passes="enzyme"`, if it is not already set. +Hopefully, you will see once again your now familiar error. Please use the share button to copy links to them. + +Please create an issue on [https://github.com/EnzymeAD/Enzyme/issues](github) and share `mwe.ll` and (if you have it) `reduced.ll`, as well as links to the compiler explorer. Please feel free to also add your Rust code or a link to it. With that, hopefully someone from the Enzyme core repository will be able to fix your bug. Once that happened, I will update the Enzyme submodule inside the rust compiler, which should allow you to now differentiate your Rust code. Thanks for helping us to improve Rust-AD. + + +# Minimize Rust code +Beyond having a minimal LLVM-IR reproducer, it is also helpful to have a minimal Rust reproducer without dependencies, +because it allows us to add it as a testcase to CI, to avoid regressions even after fixing the bug. + +There are a few solutions to help you with minimizing the Rust reproducer. +This is probably the most simple automated approach: +[cargo-minimize](https://github.com/Nilstrieb/cargo-minimize) + +Otherwise we have various alternatives, including +[treereduce](https://github.com/langston-barrett/treereduce), +[halfempty](https://github.com/googleprojectzero/halfempty), or +[picireny](https://github.com/renatahodovan/picireny) + +Potentially also +[creduce](https://github.com/csmith-project/creduce) + diff --git a/src/debug_flags.md b/src/debug_flags.md new file mode 100644 index 0000000..b5844b2 --- /dev/null +++ b/src/debug_flags.md @@ -0,0 +1,55 @@ +# Supported RUSTFLAGS +To support you while debugging, we have added support for an experimental `-Z autodiff` flag to `RUSTFLAGS`, +which allow changing the behaviour of Enzyme, without recompiling rustc. +We currently support the following values for `autodiff`: + +### Debug Flags +```bash +PrintTA // Print TypeAnalysis information +PrintAA // Print ActivityAnalysis information +PrintPerf // Print AD related Performance warnings +Print // Print all of the above +PrintModBefore // Print the whole LLVM-IR module before running opts +PrintModAfterOpts // Print the whole LLVM-IR module after running opts, before AD +PrintModAfterEnzyme // Print the whole LLVM-IR module after running opts and AD +LooseTypes // Risk incorect derivatives instead of aborting when missing Type Info +OPT // Most Important debug helper: Print a Module that can run with llvm-opt + enzyme +``` + +
+ +`LooseTypes` is often helpful to get rid of Enzyme errors stating +`Can not deduce type of ` and to be able to run some code. But please +keep in mind that this flag absolutely has the chance to cause incorrect gradients. +Even worse, the gradients might be correct for certain input values, but not for others. +So please create issues about such bugs and only use this flag temporarily while you wait for your +bug to be fixed. + +
+ +### Benchmark flags +For performance experiments and benchmarking we also support +``` +NoModOptAfter // We won't optimize the whole LLVM-IR Module after AD +EnableFncOpt // We will optimize each derivative function generated individually +NoVecUnroll // Disables vectorization and loop unrolling +NoSafetyChecks // Disables Enzyme specific safety checks +RuntimeActivity // Enables the runtime activity feature from Enzyme +Inline // Instructs Enzyme to apply additional inlining beyond LLVM's default +AltPipeline // Don't optimize IR before AD, but optimize the whole module twice after AD +``` + +You can combine multiple `autodiff` values using a comma as separator: +```bash +RUSTFLAGS="-Z autodiff=LooseTypes,NoVecUnroll" cargo +enzyme build +``` + + +The normal compilation pipeline of Rust-Enzyme is +1) Run your selected compilation pipeline. If you selected a release build, we will disable vectorization and loop unrolling. +2) Differentiate your functions. +3) Run your selected compilation pipeline again on the whole module. This time we do not disable vectorization or loop unrolling. + +The alt pipeline will not run opts before AD, but 2x after AD - the first time without vectorization or loop unrolling, the second time with. + +The two flags above allow you to adjust this default behaviour.