Skip to content

Defining mnemonics β€” #ruledef, #subruledef

Lorenzi edited this page Dec 2, 2020 · 7 revisions

You should use a #ruledef block to define mnemonics and their binary representations. You can have as many #ruledef blocks as you need, so you can easily split up your declarations.

You can combine any number of letters, words, and punctuation for a given mnemonic. Mnemonics are also case-insensitive. For example, these are all valid patterns:

#ruledef
{
    nop => 0xff

    mov a, #b => 0x35

    sub x, [hl] => 0b11010001

    add.gt r0, r3, r4, LSL #6 => 0x46
}

For the binary representation, the way you write out values matter. Their size is derived from the number of digits given.

So, for example, 0x0 is four bits long, since it's a single hexadecimal digit, and 0x001 is 12 bits long (which is to say: leading zeroes do matter).

#ruledef
{
    ; this instruction outputs 8 bits
    mov a, b => 0x35

    ; these instructions output 16 bits each
    add a, b => 0x6834
    sub a, b => 0x0002
}

You can also split up these values for visual aid by using the concatenation operator @:

#ruledef
{
    ; this instruction outputs 8 bits
    mov a, b => 0b101 @ 0b11 @ 0b001

    ; this instruction outputs 16 bits
    add a, b => 0x08 @ 0x3 @ 0b1001
}

Parameters

So far, we've only defined fixed mnemonics. If you want to receive numerical arguments (e.g. for a "load immediate" instruction), you can add as many parameters as you want with {}:

#ruledef
{
    load a, {value} => 0x55 @ value`8
}

This will allow the instruction to receive any kind of expression at that spot. The received value will be available for you to reference on the binary representation side, with the same name as declared between the {} on the mnemonic side.

Note that, when the argument used on the binary representation side, it was given a slice `8. This is because we aren't constraining what kinds of arguments we can receive, so the assembler can't guess a size for the value. The slice operator takes the lowest N bits of the value and discards the rest. So here, the instruction will truncate the argument to 8 bits, and the instruction will output 16 bits as a whole.

You can invoke this instruction with simple numerical values, or more complex calculations, like so:

; using the above #ruledef
load a, 0x33
load a, 2 + 3 * 4
load a, (0x100 - 5) * 8

You may also glue your parameter slot to a fixed token on the left, allowing you to easily accept things like the ARM registers (r1, r2, r3, and so on). By gluing your parameter slot, only unsigned decimal values can be accepted (so r0x4 is illegal), but you can still unglue your arguments at the invocation site, to use more complex expressions.

#ruledef
{
    load r{reg_num}, {value} => 0x5 @ reg_num`4 @ value`8
}

; then you can use instructions like:
load r1, 0x12
load r2, 0x40 * 2

; ungluing also works for more complex expressions:
load r 0xc, 1
load r (2 * 3), 0x99

Typed parameters

You can give types to parameters, in order to automatically constrain their sizes.

#ruledef
{
    load.b a, {value: u8}  => 0x55 @ value ; outputs 16 bits
    load.w a, {value: s16} => 0x66 @ value ; outputs 24 bits
    load.d a, {value: i32} => 0x77 @ value ; outputs 40 bits
}

A typed parameter automatically slices the received argument, so you don't have to yourself. It will also throw an error if you supply a value that's outside of its valid range. The following are the valid types, and you can replace XX with any number:

Type Description Example with 8 bits
uXX Unsigned values u8 will accept values from 0x00 to 0xff
sXX Signed values s8 will accept values from -0x80 to 0x7f
iXX Signed or unsigned values i8 will accept values from -0x80 to 0xff

Nested rule parameters

Nested rule parameters match a part of the mnemonic against another #ruledef block. This can be useful for creating named arguments, complex operands and addressing modes, or simply to cut back on repeating yourself when the same pattern appears multiple times across different mnemonics.

Give your #ruledef blocks a name to be able to reference them, then use the name as the parameter type:

#ruledef register
{
    a => 0x0
    b => 0x1
    c => 0x2
}

#ruledef
{
    ; here `r` is the parameter name, which is used on the binary representation side,
    ; and `register` is the parameter type, referring to the #ruledef block above
    load {r: register}, {value: i8} => 0x5 @ r @ value
}

; then you can use instructions like:
load a, 0x12
load b, 100
load c, -1

Note that, by the previous example, you're also able to use a, b, or c directly as instructions themselves. This is usually undesirable, so you can declare it as a #subruledef instead:

#subruledef register
{
    a => 0x0
    b => 0x1
    c => 0x2
}

#ruledef
{
    load {r: register}, {value: i8} => 0x5 @ r @ value
}

#subruledef has exactly the same syntax and semantics as the regular #ruledef, but disallows its mnemonics to be used as freestanding instructions.

You can create complex and deep mnemonics with nested rule parameters:

#subruledef register
{
    a => 0x0
    b => 0x1
    c => 0x2
}

#subruledef source
{
    {immediate: i16} => 0xd @ immediate
    mem[{address: i16}] => 0xe @ address
    ptr[{r: register}] => 0xf @ r`16
}

#ruledef
{
    load {r: register}, {src: source} => 0x55 @ r @ src
    add  {r: register}, {src: source} => 0x66 @ r @ src
}

; then you can use instructions like:
load a, 0x12
load b, mem[0xff00]
add  c, ptr[b]