This LLVM version has the purpose to generate code for the Capstone disassembler.
It refactors the TableGen emitter backends, so they can emit C code in addition to the C++ code they normally emit.
python3 -m venv .venv
source .venv/bin/activate
pip install Ninja cmake
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
Please note that within LLVM we speak of a Target
if we refer to an architecture.
The TableGen emitter backends are located in llvm/utils/TableGen/
.
The target definition files (.td
), which define the
instructions, operands, features etc., can be
found in llvm/lib/Target/<ARCH>/
.
Generating code for a target has 6 steps:
5 6
┌──────────┐ ┌──────────┐
│Printer │ │CS .inc │
1 2 3 4 ┌──►│Capstone ├─────►│files │
┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘
│ .td │ │ │ │ │ │ Code- │ │
│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤
└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │
│ ▲ │ ┌──────────┐ ┌──────────┐
└─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │
│LLVM │ │files │
└──────────┘ └──────────┘
-
LLVM targets are defined in
.td
files. They describe instructions, operands, features and other properties. -
LLVM TableGen parses these files and converts them to an internal representation of Classes, Records, DAGs and other types.
-
In the second step a TableGen component called CodeGen abstracts this even further. The result is a representation which is not specific to any target (e.g. the
CodeGenInstruction
class can represent a machine instruction of any target). -
Different code emitter backends use the result of the former two components to generated code.
-
Whenever the emitter emits code it calls a
Printer
. Either thePrinterCapstone
to emit C orPrinterLLVM
to emit C++. Which one is controlled by the--printerLang=[CCS,C++]
option passed tollvm-tblgen
. -
After the emitter backend is done, the
Printer
writes theoutput_stream
content into the.inc
files.
We use the following emitter backends
Name | Generated Code | Note |
---|---|---|
AsmMatcherEmitter | Mapping tables for Capstone | |
AsmWriterEmitter | State machine to decode the asm-string for a MCInst |
|
DecoderEmitter | State machine which decodes bytes to a MCInst . |
|
InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | |
RegisterInfoEmitter | Tables with register information (register enum, register type info...) | |
SubtargetEmitter | Table about the target features. | |
SearchableTablesEmitter | Usually used to generate tables and decoding functions for system registers. | 1. Not all targets use this. |
2. Backend can't access the target name. Wherever the target name is needed __ARCH__ or ##ARCH## is printed and later replaced. |
-
If you find C++ code within the generated files you need to extend
PrinterCapstone::translateToC()
. If this still doesn't fix the problem, the code snipped wasn't passed throughtranslateToC()
before emitting. So you need to figure out where this specific code snipped is printed and addtranslateToC()
. -
Template functions with default values for their arguments, don't get replaced properly. See:
handleDefaultArg()
inPrinterCapstone.cpp
to add the default argument value. -
Some operand printer or decoder are not recognized. Compiler error like:
.../AArch64GenAsmWriter.inc:18216:5: warning: implicit declaration of function ‘printMatrixIndex_1’; did you mean ‘printMatrix_0’? [-Wimplicit-function-declaration] 18216 | printMatrixIndex_1(MI, 2, O); | ^~~~~~~~~~~~~~~~~~ | printMatrix_0
To fix this the function declaration is probably missing in the header (e.g.
<ARCH>InstPrinter.h
). You can copy theDEFINE_printMatrix()
function to the header and rewrite it as declaration. Just check the otherDECLARE_...
macros in the header file. -
And
ARCH_OP_GROUP_...
is missing or not generated. Build error like:AArch64InstPrinter.c:2249:42: error: ‘AArch64_OP_GROUP_MatrixIndex_8’ undeclared (first use in this function); did you mean ‘AArch64_OP_GROUP_MatrixIndex’? 2249 | add_cs_detail(MI, CONCAT(AArch64_OP_GROUP_MatrixIndex, Scale), \
Fix it by adding the postfix
MatrixIndex_8
to one of the exception lists inPrinterCapstone::printOpPrintGroupEnum()
. -
If the mapping files miss operand types or access information, then the
.td
files are incomplete (happens surprisingly often). You need to search for the instruction or operands with missing or incorrect values and fix them.Wrong access attributes for: - Registers, Immediates: The instructions defines "out" and "in" operands incorrectly. - Memory: The "mayLoad" or "mayStore" variable is not set for the instruction. Operand type is invalid: - The "OperandType" variable is unset for this operand type.
-
If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own, checkout DeprecatedFeatures.md.