Skip to content

Defining mnemonics β€” #ruledef, #subruledef

Lorenzi edited this page Jan 20, 2024 · 7 revisions

You should use a #ruledef block to define mnemonics and their binary encodings. You can have as many #ruledef blocks as you need, so you can easily split up your declarations.

You can combine any number of letters, words, and punctuation for a given mnemonic. Mnemonics are also case-insensitive.

Following the mnemonic, separated by a heavy arrow =>, you must indicate the instruction's binary encoding.

For example, these are all valid patterns:

#ruledef
{
    nop => 0xff

    mov a, #b => 0x35

    sub x, [hl] => 0b11010001

    add.gt r0, r3, r4, LSL #6 => 0x46
}

For the binary encoding, the way you express numeric values matter. Their size is derived from the number of digits given.

So, for example, 0x0 is four bits long, since it's a single hexadecimal digit, and 0x001 is 12 bits long β€” which is to say: leading zeroes do matter!

If you want to use decimal values or explicitly express the size of a value, you can use the slice `X operator, as in 255`8.
The slice operator works unrestricted with any value for the size, like `3 or `19.

You can also use underscores _ to help with readability.

#ruledef
{
    ; this instruction outputs 8 bits
    mov a, b => 0x35

    ; these instructions output 16 bits each
    add a, b => 0x68_34
    sub a, b => 0x00_02
}

You can also split up these values by using the concatenation operator @:

#ruledef
{
    ; this instruction outputs 8 bits
    mov a, b => 0b101 @ 0b11 @ 0b001

    ; this instruction outputs 16 bits
    add a, b => 0x08 @ 0x3 @ 0b1001
}

Parameters

So far, we've only defined fixed mnemonics. If you want to receive numerical arguments (e.g. for a "load immediate" instruction), you can add as many parameters as you want with {}:

#ruledef
{
    load a, {value} => 0x55 @ value`8
}

This will allow the instruction to receive any kind of expression at the spot marked with braces {}. The received value can then be referenced by name on the binary encoding side.

It's recommended that the parameters of an instruction be separated by an unambiguous token, like a comma, especially when there are multiple {} expression parameters in sequence. The expression parser is greedy, and you might run into problems if it can't distinguish between [two different arguments] and [one argument with two expression terms]. For example, in load {a} {b}, the parser will recognize load 4 -7 as an invocation with a single expression argument 4 - 7 and will fail to match.

Note that, when the argument was used on the binary encoding side, it had to be given a slice `8 to truncate and explicitly indicate its size. This is because we aren't constraining what type of arguments we can receive, so the assembler will accept any size of value that's passed in, but it still needs the binary encoding to have an explicit size indicated. To avoid this, you can use typed parameters, as seen below.

The slice operator takes the lowest N bits of the value and discards the rest. So here, the instruction will truncate the argument to 8 bits, and the instruction will output 16 bits as a whole.

You can invoke this instruction with simple numerical values, or more complex calculations, like so:

; using the above #ruledef
load a, 0x33
load a, 2 + 3 * 4
load a, (0x100 - 5) * 8

You may also "glue" your parameter slot to a fixed token on the left, by not placing any whitespace between them, which allows you to easily accept things like the ARM registers (r1, r2, r3, and so on).

#ruledef
{
    load r{reg_num}, {value} => 0x5 @ reg_num`4 @ value`8
}

; then you can use instructions like:
load r1, 0x12
load r2, 0x40 * 2

; you may even use more complex expressions,
; although the syntax might start to get confusing:
load r0xc, 0x40 * 2 ; same as if it used `r12`
load r3 + 3, 0x40 * 2 ; same as if it used `r6`
load r(4 + 4), 0x40 * 2 ; same as if it used `r8`

Typed parameters

You can give types to parameters, in order to automatically constrain their sizes.

#ruledef
{
    load.b a, {value: u8}  => 0x55 @ value ; outputs 16 bits
    load.w a, {value: s16} => 0x66 @ value ; outputs 24 bits
    load.d a, {value: i32} => 0x77 @ value ; outputs 40 bits
}

A typed parameter automatically slices the received argument, truncating the values to the given sizes, so you don't have to do it yourself. It will also throw an error if you supply a value that's outside of its valid range. The following are the valid types, and you can replace XX with any number:

Type Description Example with 8 bits
uXX Unsigned values u8 will accept values from 0x00 to 0xff
sXX Signed values s8 will accept values from -0x80 to 0x7f
iXX Signed or unsigned values i8 will accept values from -0x80 to 0xff

Nested rule parameters

You can also use the name of another #ruledef block as the type for an instruction parameter. This can be useful for creating named arguments, complex operands and addressing modes, or simply to cut back on repeating yourself when the same pattern appears multiple times across different mnemonics.

#ruledef register
{
    a => 0x0
    b => 0x1
    c => 0x2
}

#ruledef
{
    ; here `r` is the parameter name, which is used on the binary encoding side,
    ; and `register` is the parameter type, referring to the #ruledef block above
    load {r: register}, {value: i8} => 0x5 @ r @ value
}

; then you can use instructions like:
load a, 0x12
load b, 100
load c, -1

Note that, by the previous example, you're also able to use a, b, or c directly as instructions themselves. This is usually undesirable, so you can declare it as a #subruledef instead:

#subruledef register
{
    a => 0x0
    b => 0x1
    c => 0x2
}

#ruledef
{
    load {r: register}, {value: i8} => 0x5 @ r @ value
}

#subruledef has exactly the same syntax and semantics as the regular #ruledef, but disallows its mnemonics to be used as freestanding instructions.

You can create complex and deep mnemonics with nested rule parameters:

#subruledef register
{
    a => 0x0
    b => 0x1
    c => 0x2
}

#subruledef source
{
    {immediate: i16} => 0xd @ immediate
    mem[{address: i16}] => 0xe @ address
    ptr[{r: register}] => 0xf @ r`16
}

#ruledef
{
    load {r: register}, {src: source} => 0x55 @ r @ src
    add  {r: register}, {src: source} => 0x66 @ r @ src
}

; then you can use instructions like:
load a, 0x12
load b, mem[0xff00]
add  c, ptr[b]