jvo-asm

This is a toy x86 assembler written from scratch. It was written to gain a better understanding of how machine code and executable files work. Its syntax uses a lot of emojis because why not?

Usage

Using the print example:

$ cargo run -- examples/print.jas
hi!

Features

Constants

🖊LINUX_SYSCALL $128
# ...
❗ LINUX_SYSCALL

Comments

# I'm a comment
🦘= ✉exit

Addressing

Immediate addressing

⚫ ⬅ $8

Load 8 into ⚫.

Register addressing

🔴 ⬅ 🔵

Copies data from 🔵 into 🔴.

Direct addressing

📗my_number 3
# ...
🔴 ⬅ my_number

This loads 3 into 🔴.

Indirect addressing

🔴 ⬅ $0~🔵

This loads the value at the address contained in 🔵 into 🔴.

Base pointer addressing

🔴 ⬅ $4~🔵

Or alternatively with a constant:

🖊ST_ARG $8
# ...
🔴 ⬅ ST_ARG~🔵

This is similar to indirect addressing except that it adds a constant offset to the address in 🔵.

Labels

🦘 ✉exit
# ...
📪exit:
⚪ ⬅ $1
❗ LINUX_SYSCALL

Labels are defined by prefixing them with 📪 and ending them with a :. To refer to a label prefix it with ✉ instead.

Data sections

📗numbers 3, 67, 34, 222, 45
# ...
🔵 ⬅ numbers

Data sections start with 📗 and can be referred to later by just their name.

Implementation notes

The main high-level function which processes a file is process. First the code is broken up into separate lines. Each line is then tokenized into a vector of TokenType. ConstantReferences are replaced by their constants and the vector is compiled into a vector of IntermediateCode. Intermediate code consists of bytes and displacements. We need this intermediate step because e.g. a jump to an instruction further down the program can not be encoded, when we encounter a jump to a next instruction we don’t know yet how far to jump. After this we iterate through the IntermediateCode and replace the displacements with bytes. This is done by keeping track of the byte offset of each instruction in the program during the first step.

After this an ELF binary is built. Its layout is as follows (the multiple data sections example was used here):

$ readelf -a a.out
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x804b000
  Start of program headers:          52 (bytes into file)
  Start of section headers:          148 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         5
  Section header string table index: 4

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] pi                PROGBITS        08049000 001000 000014 00  WA  0   0  1
  [ 2] euler             PROGBITS        0804a000 002000 000014 00  WA  0   0  1
  [ 3] .code             PROGBITS        0804b000 003000 000019 00  AX  0   0  1
  [ 4] .shstrtab         STRTAB          00000000 000400 00001a 00      0   0  1

...

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x003000 0x0804b000 0x0804b000 0x00019 0x00019 R E 0x1000
  LOAD           0x001000 0x08049000 0x08049000 0x00014 0x00014 RW  0x1000
  LOAD           0x002000 0x0804a000 0x0804a000 0x00014 0x00014 RW  0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .code
   01     pi
   02     euler

...

There’s a program header entry for each data section (📗) and for the executable code. Everything is padded to 4 KB (=virtual page size). To allow for linking a correct section header is also generated.

Instruction reference

Registers

Symbol	Name
⚪	`%eax`
🔴	`%ebx`
🔵	`%ecx`
⚫	`%edx`
◀	`%esp`
⬇	`%ebp`

Instructions

Symbol	Example	Description
↩	↩	Return from a function
📞	📞 fn	Call function
➕	⚪ ➕ ⚫	`⚪ += ⚫`
➖	⚪ ➖ ⚫	`⚪ -= ⚫`
✖	⚪ ✖ ⚫	`⚪ *= ⚫`
⬅	🔴 ⬅ $1	Move into register
❗	❗ $128	Interrupt
⚖	⚖ ⚫, ⚪	Compare ⚫ to ⚪
🦘=	🦘= ✉exit	Jump if equal
🦘≠	🦘≠ ✉exit	Jump if not equal
🦘<	🦘< ✉exit	Jump if less than
🦘≤	🦘≤ ✉exit	Jump if less or equal
🦘>	🦘> ✉exit	Jump if greater than
🦘≥	🦘≥ ✉exit	Jump if greater or equal
🦘	🦘 ✉exit	Unconditional jump
📥	📥 $8	Push onto stack
📤	📤 🔵	Pop from stack
🖊	🖊c $4	Define constant `c` to be 4
📪 (ends with :)	📪exit:	Define a label with name `exit`
📗	📗pi 3, 1, 4	Define a data section `pi` containing 3 integers
✉	✉exit	Refer to a previously defined (📪) exit label
$	$1	1 is a number
#	# hi!	`hi!` is a comment
[0-9]+	1	1 is a memory address
[aA-zZ]+	constant	`constant` is a previously defined (🖊, 📗) constant

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
COPYING		COPYING
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.org		README.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jvo-asm

Usage

Features

Constants

Comments

Addressing

Immediate addressing

Register addressing

Direct addressing

Indirect addressing

Base pointer addressing

Labels

Data sections

Implementation notes

Instruction reference

Registers

Instructions

About

Releases

Packages

Languages

License

jorenvo/jvo-asm

Folders and files

Latest commit

History

Repository files navigation

jvo-asm

Usage

Features

Constants

Comments

Addressing

Immediate addressing

Register addressing

Direct addressing

Indirect addressing

Base pointer addressing

Labels

Data sections

Implementation notes

Instruction reference

Registers

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages