🐌
Hey, check out my blog posts on compilers in general and this one in particular.
The project is purely for fun and it's, quite frankly, just another toy compiler. That said, here's a couple things that might set it apart.
The language is concise but retains a degree of expressivity
A lot of mini-compilers have a language consisting of functions, primitive values and pretty much nothing else. Whereas this project's language Cool 2020 is a small Scala subset — it has classes, inheritance, virtual dispatch, and even a very simple form of pattern matching. Implementing all of these features yourself, making them function at the assembly level provides a lot of insights into how production-grade programming languages work.
The language's runtime library implements a simple generational garbage collector.
Then GNU as
and ld
come into play to build a native executable from the assembly. Many educational compilers emit MIPS assembly, and although it's possible to run it using an emulator, to me, running a native executable produced by your own compiler feels much more rewarding.
I also prefer emitting assembly instead of, for instance, C. It helps you understand the inner workings of various things that developers often take for granted:
- Managing a function's stack frame
- Calling conventions
- Finding the memory address of an identifier
- Converting expressions into assembly
- And so on, and so forth
High-level languages are a bit too... well, high-level for someone who wants to grasp how these things work under the hood. However, if you're more interested in designing a programming language than the nitty-gritty of compilers, targeting a high-level language like C might actually be a great idea.
The emitted assembly code contains a lot of hopefully useful comments. The idea here is to help understand the assembly and relate it back to the source code as much as possible.
Click to expand/collapse a sample assembly listing
.text
# ../CoolPrograms/Runtime/Fibonacci.cool(1,7): Fib
.global Fib..ctor
Fib..ctor:
pushq %rbp
movq %rsp, %rbp
subq $64, %rsp
# store actuals on the stack
movq %rdi, -8(%rbp)
# store callee-saved regs on the stack
movq %rbx, -24(%rbp)
movq %r12, -32(%rbp)
movq %r13, -40(%rbp)
movq %r14, -48(%rbp)
movq %r15, -56(%rbp)
# actual #0
movq -8(%rbp), %rbx
subq $8, %rsp
movq %rbx, 0(%rsp) # actual #0
# load up to 6 first actuals into regs
movq 0(%rsp), %rdi
# remove the register-loaded actuals from stack
addq $8, %rsp
call IO..ctor # super..ctor
movq %rax, %rbx # returned value
# ../CoolPrograms/Runtime/Fibonacci.cool(8,18): 0
movq $INT_0, %rbx
# ../CoolPrograms/Runtime/Fibonacci.cool(8,5): var i: Int = 0
movq %rbx, -16(%rbp) # i
.L6_WHILE_COND:
# ../CoolPrograms/Runtime/Fibonacci.cool(9,12): i
movq -16(%rbp), %r10 # i
# ../CoolPrograms/Runtime/Fibonacci.cool(9,17): 10
movq $INT_10, %r11
The two supported hosts are x86-64 Windows and Linux. And in general, the project is based on cross-platform technologies. The compiler itself runs on .NET. The GNU tools as
and ld
are available on Linux and Windows (and many, many other, of course). The language's runtime is implemented in assembly and is split up into three parts: common code, Windows-specific bits, and Linux-specific bits that are responsible for calling Windows API and Linux system functions respectively.
Approximately 288 automated tests make up the test suite. One nice consequence of generating native executables is an automated test can compile a program, run it, and compare the program's output to the expected values. Out of the 288 tests, there are 39 that do exactly that.
This page tries to give a bit of background on the project and describe it in more detail.
Cool 2020 is a subset of Scala with minor incompatibilities. Let's first have a taste of the language and then move on to discussing the language in more details.
class Fib() extends IO() {
def fib(x: Int): Int =
if (x == 0) 0
else if (x == 1) 1
else fib(x - 2) + fib(x - 1);
{
var i: Int = 0;
while (i <= 10) {
out_string("fib("); out_int(i); out_string(") = ");
out_int(fib(i));
out_nl();
i = i + 1
}
};
}
class Main() {
{ new Fib() };
}
Too see more of the language's features in action take a look at Life.cool, QuickSort.cool, and InsertionSort.cool.
... the Classroom Object-Oriented Language ... retains many of the features of modern programming languages including objects, static typing, and automatic memory management...
Cool programs are sets of classes. A class encapsulates the variables and procedures of a data type. Instances of a class are objects. In Cool, classes and types are identified; i.e., every class defines a type. Classes permit programmers to define new types and associated procedures (or methods) specific to those types. Inheritance allows new types to extend the behavior of existing types.
Cool is an expression language. Most Cool constructs are expressions, and every expression has a value and a type. Cool is type safe: procedures are guaranteed to be applied to data of the correct type. While static typing imposes a strong discipline on programming in Cool, it guarantees that no runtime type errors can arise in the execution of Cool programs.
Click to expand/collapse
(This grammar ignores the precedence of operations.)
grammar Cool_2020;
program
: classdecl+
;
classdecl
: 'class' ID varformals ('extends' ID actuals)? classbody
;
varformals
: '(' (varformal (',' varformal)*)? ')'
;
varformal
: 'var' ID ':' ID
;
classbody
: '{' (feature ';')* '}'
;
feature
: 'override'? 'def' ID formals ':' ID '=' expr
| 'var' ID ':' ID '=' expr
| '{' block? '}'
;
formals
: '(' (formal (',' formal)*)? ')'
;
formal
: ID ':' ID
;
actuals
: '(' (expr (',' expr)*)? ')'
;
block
: (('var' ID ':' ID '=')? expr ';')* expr
;
// The expression's syntax is split in `expr`, `assign_or_prefixop`, `primary`, and `infixop_rhs` to avoid left recursion.
expr
: prefix* primary infixop_rhs*
;
prefix
: ID '='
| '!'
| '-'
| 'if' '(' expr ')' expr 'else'
| 'while' '(' expr ')'
;
primary
: ('super' '.')? ID actuals
| 'new' ID actuals
| '{' block? '}'
| '(' expr ')'
| 'null'
| '(' ')'
| ID
| INTEGER
| STRING
| BOOLEAN
| 'this'
;
infixop_rhs
: ('<=' | '<' | '>=' | '>' | '==' | '!=' | '*' | '/' | '+' | '-') expr
| 'match' cases
| '.' ID actuals
;
cases
: '{' ('case' casepattern '=>' caseblock)+ '}'
;
casepattern
: ID ':' ID
| 'null'
;
caseblock
: block
| '{' block? '}'
;
ID
: [a-zA-Z$_][a-zA-Z0-9$_]*
;
INTEGER
: [0-9]+
;
STRING
: '"' (~["\\] | '\\' [0btnrf"\\])*? '"'
| '"""' .*? '"""'
;
BOOLEAN
: 'true'
| 'false'
;
BLOCK_COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' .*? ('\r\n' | '\r' | '\n') -> skip
;
WS
: [ \r\n\t]+ -> skip
;
Click to expand/collapse
The precedence of operations is given below from highest to lowest (where -num denotes unary minus).
Keep in mind, match
, if
, while
are expressions in Cool 2020.
.
! -num
* /
+ -
== !=
<= >=
< >
match
if while
=
The compiler can successfully build a Windows x64 or Linux x64 executable out of every sample program.
Introduce the concept of a companion object limited to objects predefined by the runtime. In particular, it will allow a Cool 2020 program to invoke GC related methods like this:
GC.collect(-1);
GC.print_state()
Instead of the current approach:
GC_collect(-1);
GC_print_state()
The compiler is written in F#. F# is a .NET language. In the light of these facts, it might not necessarily come as a surprise that a .NET SDK is a dependency.
Get it from the download page.
Download and run an SDK installer.
Keep in mind:
If you're using Visual Studio, look for the SDK that supports the version you're using.
If you're not using Visual Studio, install the first SDK listed.
Follow these instructions.
Just to give an example. On Ubuntu 22.04 installation looks something like this:
# Install the SDK.
# (If you install the .NET SDK, you don't need to install the corresponding runtime.)
sudo apt-get update && \
sudo apt-get install -y dotnet-sdk-8.0
On both Windows and Linux, to check your installation, do the following command. If everything is OK, the output will contain one or many 8.0.xyz versions.
dotnet --list-sdks
The compiler emits x86-64 assembly. It uses as
to assemble it. And ld
to link with the runtime.
You can get binutils as part of MSYS2, which provides up-to-date native builds of GNU Binutils, Mingw-w64, etc. Follow the steps outlined on this page starting from the step #3.
Click to expand/collapse the installation steps
- Get the latest version of Mingw-w64 via MSYS2. You can download the latest installer from the MSYS2 page.
- Follow the Installation instructions on the MSYS2 website to install Mingw-w64. Take care to run each required Start menu and pacman command.
- Install the Mingw-w64 toolchain (
pacman -S --needed base-devel mingw-w64-x86_64-toolchain
). Run the pacman command in a MSYS2 MINGW64 terminal (you'll find it in the Start menu). Accept the default to install all the members in the toolchain group. - Add the path to your Mingw-w64 bin folder to the Windows
PATH
environment variable by using the following steps:- In the Windows search bar, type 'settings' to open your Windows Settings.
- Search for Edit environment variables for your account.
- Choose the Path variable in your User variables and then select Edit.
- Select New and add the Mingw-w64 destination folder path to the system path.
The exact path depends on which version of Mingw-w64 you have installed and where you installed it.
If you used the settings above to install Mingw-w64, then add this to the path:
C:\msys64\mingw64\bin
. - Select OK to save the updated
PATH
. You will need to reopen any console windows for the new PATH location to be available
In the case you use MSYS2 to install their MinGW packages, please keep in mind the following. The compiler links an executable using a command similar to
ld -o a.exe -e main a.o ../../src/Runtime/rt_common.o ../../src/Runtime/rt_windows.o -L"C:/msys64/mingw64/x86_64-w64-mingw32/lib" -lkernel32
Starting at least from GNU Binutils 2.36.1 (and maybe an earlier version), the command above produces an error along these lines:
a.o:fake:(.text+0x3f): relocation truncated to fit: R_X86_64_32S against symbol `IO_proto_obj' defined in .data section in ../../src/Runtime/rt_common.o
a.o:fake:(.text+0x7c): relocation truncated to fit: R_X86_64_32S against `.data'
a.o:fake:(.text+0x9e): relocation truncated to fit: R_X86_64_32S against `.data'
a.o:fake:(.text+0xd2): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x36): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x6d): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x89): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0xbb): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0xdb): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0xf8): additional relocation overflows omitted from the output
A news report on the MSYS2 page and an ld bug report seem to discuss a very similar issue though not exactly the same.
I don't really understand what causes the issue with linking object files produced from compiler-generated assembly. But a workaround suggested on the MSYS2 page solves it. The workaround is to pass --default-image-base-low
to ld.
The compiler's code includes this flag into the linker command line to work around the problem. If you still encounter similar error messages, you might try downgrading GNU Binutils to 2.34 to make the problem go away.
pacman -U http://repo.msys2.org/mingw/x86_64/mingw-w64-x86_64-binutils-2.34-3-any.pkg.tar.zst
Install your distribution's binutils package. For example, on Ubuntu it looks something like this:
sudo apt install binutils
On Windows, use Bash to perform these commands. As a git user, you likely have Git Bash installed and know how to use it. If not, take a look at this nice tutorial.
# Clone the repo.
git clone https://github.com/mykolav/coollang-2020-fs.git
cd coollang-2020-fs/
# Build F# code.
# And assemble the runtime's code.
./build.sh
A successful build should look similar to the following.
Building F#
Microsoft (R) Build Engine version 16.8.0+126527ff1 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.
Determining projects to restore...
Restored /home/appveyor/projects/coollang-2020-fs/src/clc/clc.fsproj (in 5.41 sec).
Restored /home/appveyor/projects/coollang-2020-fs/src/LibCool/LibCool.fsproj (in 20 ms).
Restored /home/appveyor/projects/coollang-2020-fs/src/Tests/Tests.fsproj (in 10.66 sec).
LibCool -> /home/appveyor/projects/coollang-2020-fs/src/LibCool/bin/Debug/netstandard2.1/LibCool.dll
clc -> /home/appveyor/projects/coollang-2020-fs/src/clc/bin/Debug/net6.0/clc.dll
Tests -> /home/appveyor/projects/coollang-2020-fs/src/Tests/bin/Debug/net6.0/Tests.dll
Build succeeded.
0 Warning(s)
0 Error(s)
Time Elapsed 00:00:50.56
Assembling the runtime
Done
On Windows, use Bash to follow the examples.
One option is using Git Bash. To get more details, take a look at this nice tutorial.
Change directory to sandbox/clc
.
The folder contains a bash script clc
. It's a wrapper that invokes the compiler and passes through command line arguments.
clc file1.cool [file2.cool, ..., fileN.cool] [-o file.exe | -S [file.asm]]
To compile and run QuickSort.cool, follow these steps.
# Compile
$ ./clc ../../src/Tests/CoolPrograms/Runtime/QuickSort.cool -o qs
Build succeeded: Errors: 0. Warnings: 0
# Run
$ ./qs
30 20 50 40 10
10 20 30 40 50
To see the assembly the compiler emits for QuickSort.cool, perform the command below. Then open qs.s in your favorite editor.
$ ./clc ../../src/Tests/CoolPrograms/Runtime/QuickSort.cool -S qs.s
Build succeeded: Errors: 0. Warnings: 0
Take a look inside src/Tests/CoolPrograms/Runtime for more test programs. In particular, relatively bigger programs it contains are Life.cool, InsertionSort.cool, QuickSort.cool, and Fibonacci.cool.
The implementation language is F#, but the code is imperative/OO. That said, the code is, hopefully, consistent with the recommendations from "F# Code I Love" by Don Syme.
Initially it was my ambition to practice writing functional code while learning about compilers. But trying to do both at the same time was biting off more than I could chew.
Used as an imperative/OO language, F# has many and many nice features. More mainstream languages started to catch up only recently. E.g., no null values by default, records, discriminated unions, pattern matching, primary constructors, etc.
Plus, F# is a low ceremony, low syntactic noise language. In this respect, you can think of F# as Python but statically typed.
The code contains a notable deviation from F# formatting guidelines. The guidelines recommend camelCase naming for all let
bindings. But I use camelCase for function bindings and snake-case for value bindings. This way function and value binding are easier to tell apart at a glance.
- Douglas Thain, Introduction to Compilers and Language Design
- Bob Nystrom, Crafting Interpreters
- Alex Aiken, Compilers
- Eli Bendersky, Parsing expressions by precedence climbing
- Gabrijel Boduljak, Coursework project for the Stanford Compilers MOOC course
- Arek Holko, MIPS assembler files generated by my COOL compiler
- Kuang Han, Dan Deng, Ang Li, Compiler-SML An implementation of a compiler for the Tiger Language
- Guide to x86-64
- Siew Yi Liang, Understanding Windows x64 Assembly
- System V ABI
- Lots and lots of other blog posts and github repos
The original Cool language was designed by Alex Aiken.
The Cool 2020 version was designed by John Boyland.
QuickSort.cool and InsertionSort.cool came from a paper from LUME - the Digital Repository of the Universidade Federal do Rio Grande do Sul.
The web page uses GitHub Corners by Tim Holman.
This project is licensed under the MIT license. See LICENSE for details.
GitHub Corners is licensed under the MIT license. See LICENSE.github-corners for details.