Compiler Design

July 13: Status update

Team project scheduling

Status update for all. Please populate your README with these:

What are you working on?
What’s your implementation language?
What have you done so far? Did you meet your expectations for this week yet?
What do you hope to accomplish by next week? How? (Who’s doing what, if this is a team?)
What do you need to move forward?

Also: Looking for a target language? How about JVM?

Also: Get motivated

Don’t forget Hofstadter’s law!

July 10: Random stuff

Javascript problems and solutions
Security Engineering

Parsing hints:

If you’re parsing an arbitrary grammar, may I suggest using a parsing tool:

Parsing tool for C++
Parsing tool for Java
Parsing tool for Python
Parsing tool for Javascript

Note	If you’re parsing C++, Java, or Python itself, please use existing compilers as a starting point.

July 8: Microoptimizations and more project stuff

How to optimize

Patterns for writing small memory software
Summary (PDF)

What to optimize

Brofiler
VisualVM

When to optimize

Later, because: Premature optimization is the root of all evil. — Donald Knuth

Also: Needs more jpeg

July 6: Linkers and project stuff

Linkers

Did you know Grace Hopper gave compilers their name? What she was referring to was what is now known as linkers and loaders.

All you ever wanted to know about linkers

Develop for Android? Use Proguard is like gc for linking. It removes un-referenced objects from classfiles.

My own crappy project

A derivative-based parsing framework

Project stuff

For the remaining labs, we’ll update each other on our progress.

During each lab period, each team should have answers to these questions.

What did you do? (Nothing isn’t a good answer to that question)
Did you achieve what you set out to achieve from the prior week?
What are you struggling with or what obstacles do you face?
What do you plan to achieve by next week?

June 29: Off-site

9am: Starbucks 283 Longwood Ave, Boston, MA 02115
11am: J.P. Licks Mission Hill 1618 Tremont St Boston, MA 02120

June 26: Cool things on the internet

I got nothing.

SpaceX launch
Awesome landing
Major field test
WTF
Awesome lists
Parsing with derivatives
Awareness test

June 25: Lab 6. Team and project selection, repo setup

Get some friends (solo projects are sad projects)

For best results, please don’t go it alone. Groups of 2-5 work best. Don’t be shy, but we’ll skip the icebreaker. :-)

Get some ideas (and discuss them)

See June 19 for some lame ideas. Remember the goal: grammars, pointers and recursion, oh my! :-) Discuss project ideas, and write them up in README or in your issue tracker.

Setup a repo (even if you have no friends or ideas)

Yes, everybody does this, not just the team lead. If the project has one central repo, then every push forces a merge, which is a bad thing. Either have separate repos or separate branches to push to.

Create a repo
Be sure to add lawrancej to your collaborators (Go to: Settings → Collaborators), so I can see your repo.
Clone your repo (git clone the SSH URL)
Go into your repo (cd your-repo)
Add your friend’s remote (git remote add your-friends-name their-url)
Fetch their stuff (git fetch --all)

June 24: Canceled :-(

June 22: Post-midterm review

June 19: Group project kick-off

For our project, we’ll begin after we’re done with our common labs (they’re done!). You’re welcome to work with as few or as many people as you wish either in this section or others. Start thinking about which of these you’d like to do, or suggest new ideas.

You’re welcome to pursue these traditional project ideas of implementing a small language:

Implement some moderately simple language, like say, Cool, LISP, Kaleidoscope or PL/0.
Something compiler-related that dovetails nicely with Senior project (e.g., a textual command interface to some game, web or mobile app).

These ideas are also welcome:

A parser combinator library to target multiple parsing strategies (derivative, shift-reduce, or recursive descent parsing).
Semantic merge, but for something like JSON or XML

These projects build upon tools like clang (for C/C++), JDT (for Java), Python’s ast module to do work:

Automated refactoring tool for existing languages to serve education and large projects
Search engine for identifiers and literals in code that makes good recommendations, (e.g., PageRank)
Something like Quickcheck, but can generate characterization tests automatically and efficiently.

Here’s some starter code:

Parse and traverse Java code
Parse and traverse C/C++ code

Also, esoteric languages are good for laughs:

Esoteric languages
ArnoldC (it’s not a tumor!)

June 18: Midterm

June 17: Midterm review?

June 15: Midterm review

Hey, check this out.
Regex to NFA.
NFA to DFA.

June 12: Midterm review

June 11: Lab 5. Finish up prior labs, and try this out for kicks

Preliminaries

Perform static analysis of Java code.

Download and install Graphviz
Fork and clone jdt-project.
Next, add JAVA_HOME to your environment variables:

Windows: Search for "environment variables" and click 'Edit the system environment variables'. Click 'Environment Variables…' → 'New…'

Variable name: JAVA_HOME

Variable value: C:\Program Files\Java\jdk1.8.0_05 (or whatever version you’re using)
Click OK, OK, OK.

Close and reopen Git Bash. If you get the same error, try turning it off and on again
Import the project into eclipse.
```
git clone git@github.com:lawrancej/jdt-project.git
cd jdt-project
./gradlew eclipse
```
'File' → 'Import' → 'General' → 'Existing projects into workspace'
Read through the code. Open Main and run it. Nothing will happen. You’ll need to supply the root folder of a Java project to main.

Go to 'Run Configurations' → 'Main' → 'Arguments' → 'Program arguments'. Enter the path to a Java project. Click 'Run'. If you have no other Java projects, you can supply the source of jdt-project to itself. Huzzah!
Modify AstVisitor to do one of the following (pick one):

Lab (choose your own adventure)

Modify ASTVisitor subclass to do something interesting.

Documentation for ASTVisitor

For example:

Generate UML class diagram for source code (Show members of classes) See this for insipration
Generate a graph of class dependencies (Type uses Types) See this for inspiration
Generate a graph of package dependencies (Package uses Packages)
Generate a graph of method dependencies (Method uses Methods)
Generate a graph of class inheritance / interface implementation
Suggest some other graph-related static analysis

June 10: Midterm practice?

Pull from me: git pull upstream master

Take a look at an old midterm for practice purposes: start exams/Midterm1.pdf

Had any difficulty? Let’s discuss.

Bootstrapping: Making a compiler "self-hosting" so that the compiler can be written in the language it compiles. The first C compiler was written in a different language.
Nondeterminism: Having more than one option about which state to visit next.
Ambiguity: Having more than one parse tree for a given input.

June 8: SSA and GC

Single static assignment (SSA) Form

SSA is a transformation on code that is a prerequisite for many low-level optimizations, such as dead/duplicate code elimination. Think of it like version control for variables. Each variable gets a new version number when an assignment is made, hence single assignment. If we have multiple branches (i.e., loops or conditionals), we need to merge different variable versions together (denoted by the phi function).

Pseudocode

SSA form

Basic block:

a = 5
a = a + 10
print a

SSA Basic block:

a_0 = 5
a_1 = a_0 + 10
print a_1

Conditional

a = 5
if (a < 10) {
   a++
} else {
   a--
}
a = a * 2
print a

SSA conditional

a_0 = 5
if (a_0 < 10) {
   a_1 = a_0 + 1
} else {
   a_2 = a_0 - 1
}
a_3 = phi(a_1,a_2) * 2
print a_3

Garbage collection

Ah memory management. Regardless of how it happens, it must happen, unless you like leaking memory.

It helps to remember modern computer systems give us three kinds of memory:

Static memory
Stack memory
Heap memory

Static memory is pretty straighforward: it’s a chunk of memory that comes and goes with the program itself, and thus does not grow or shrink over the lifetime of the program. Stack memory is managed using, ahem, a stack. (Who’da thunk it?)

When we think of memory management, we’re almost certainly thinking about the heap: dynamically-allocated memory from the operating system with no pre-set lifespan. Therefore, either the programmer has to specify when the memory is no longer needed, or we can rely on garbage collector to clean up after our mess.

Garbage collection algorithms must know the difference between pointer and an integer. This is why C doesn’t have it. Just kidding, you can do garbage collection in C, but it must be conservative: it can’t make guarantees that it collected all the garbage.

Reference counting

Strategy

Just count how many things point to this object, and when that count drops to 0, free the object.

Pros

Simple to implement
Reasonably fast
Reasonably good (if Python uses it, it must be somewhat good)

Cons

Now, every object has to have an extra integer just for the reference count.
What happens when you got two objects pointing to each other (like in a circular linked list)? Crap! The reference count never drops to zero, that’s what!

Tracing (Mark sweep) garbage collection

There’s many variations of tracing (mark-sweep) garbage collection.

Strategy

Maintain a root set (a set of objects reachable throughout the program and in the current scope of the program).
Traverse (trace) the object graph starting from the root set, looking for garbage (objects unreachable from the root set)

Pros

This can deal properly with all garbage, including circular linked lists that nobody else references
No space overhead of reference counts

Cons

Naive implementations are slow, and briefly hang programs
Not what you’d use when precise timing is important (e.g., launching a rocket, autonomous cars)
Essentially, this algorithm is what gave garbage collection its bad reputation

Naive mark sweep

Tracing garbage collection that runs when we’re out of memory, and stops the program during garbage collection.

Concurrent/incremental mark sweep

The program still runs during GC (which happens in a separate thread), but marked objects are locked as necessary.

Generational

Most objects on the heap are short-lived: they’re dynamically allocated and freed almost right away. Other objects, fewer in number, live long, productive and happy lives. This form of GC moves reachable objects between two or more memory pools called generations, without touching garbage.

Note	Good compilers will optimize away as much heap allocation as possible using escape analysis, checking at compile time to see if an object could be referenced outside a function. If not, allocate on the stack.

June 5: Symbol tables, type checking

Before we begin… Will it optimize? Programming language inventor or serial killer? Also, The programming language network

Symbol tables

A map among identifiers, scopes and other information (e.g., its type, where it’s defined).

In an interpreter, these can be used for data storage.
In a compiler, these are used to generate code.

Type checking

Type checking ensures that no types are mismatched.

Strong vs. weak typing: How rigidly types are enforced? Strongly-typed languages enforce types rigidly (e.g., Haskell, Rust, Python). Weakly-typed languages allow some implicit mismatched type coercion (e.g., PHP, C).
Dynamic vs. Static typing: When does type-checking happen? Dynamically-typed languages check type mismatches at run-time (e.g., Python, Ruby, Javascript). Statically-typed languages check type mismatches at compile-time (e.g., Java, C++, Haskell), by traversing (and decorating) the AST.

How to get this wrong: Useing you’re types good

Memory

Globals: Memory that comes preallocated with the program (i.e., global constants or variables, the data area in assembly).
Stack: Memory allocated on the stack frame (i.e., local variables in a function). Deallocation happens on function exit.
Heap: Dynamically-allocated memory (i.e., memory allocated with new or malloc). Deallocation happens either manually with delete or free, or with a garbage collector. Rust tracks ownership in the type system, allowing the type checker to determine where to place deallocation code at compile time.

June 4: Lab 4. Optimize Brainfuck

Optimize your compiler and interpreter developed in Lab 3.

Modify CommandNode so that it includes a counter (presumably an int or the like).
Modify the parser a bit so that it only emits a command node after it has encountered a full run of the same command. (e.g., ----- becomes CommandNode(\'-', 5))
Modify the interpreter and compiler accordingly.

In short: do an optimization that performs run-length encoding on Brainfuck code.

Then, optimize away certain loops (e.g., [-] or [+]) with a CommandNode to assign zero to the current memory location.

Modify Command to include another command type: ZERO
Modify the CommandNode constructor
Either traverse through the tree with an Optimizer visitor to do replacements, preprocess the input to replace [-] or [+] with new node types, or in the recursive call to parse, check the Loop that we get and emit the proper CommandNode

Hints

Use peek to check when to add a command node to the current container.

You can tell the optimizer is working if the code your compiler generates includes numeric literals, and the printer and interpreter still work.

Test out your old brainfuck interpreter on src/99bottles.bf and compare it with your optimized brainfuck interpreter. Is it faster?

June 3: Optimizations

There are many of them.

June 1: Let’s write a compiler (sorta)

Announcements

Lab 3 has a new test program, echo.bf that just prints out what you type when run. rot13.bf may not actually work as advertised, derp.

Let’s talk about quines. And quine relays.

I added quine.bf to test lab 3 using the is-lab-2-done.sh script because I’m that lazy.

Preliminaries

git commit -am "WIP" # Commit your stuff if you need to
git fetch upstream
git merge upstream/lab3
git mergetool # if you see a CONFLICT

Activity

Copypasta the Printer visitor in src/brainfuck.cpp.
Rename it to Compiler.
Instead of printing out Brainfuck code, print out equivalent code for a different language. For languages that need it (e.g., Java), pick a name for your program class (e.g., Default).

For example, in Java:

./brainfuck.exe echo.bf > Default.java # Translate brainfuck to java
javac Default.java # Compile translated Java code
java Default # Run translated Java bytecode (it should do what echo.bf does)

Done!

For example, to C:

git fetch upstream
git merge upstream/brainfuck2c
git mergetool
cd src
g++ brainfuck.cpp -o brainfuck.exe
./brainfuck.exe echo.bf > echo.c # translate brainfuck to C
gcc echo.c -o myecho.exe
myecho.exe # The compiled executable form of echo.bf

May 29: Parsing frameworks

Note	Read Read through chapter 5.

Note	In Lab 3, use `cin.get(mumble)` to read in a char, `cin >> mumble` ignores spaces.

Parser generators

A parser generator is a tool that takes a grammar specification in a file, and produces parse code.

There are many of them. Each has severe limitations. Since these require a grammar spec, you need to understand the grammar’s grammar.

Here’s one for Java. ANTLR

Parser combinators

There are many of them.

The parser code isn’t a separate tool, it’s a library you embed in your program. So, you specify a grammar in your code, and let the library do the parse for you.

Examples: Spirit, Parsec

Inferring grammars from sequences

Sequitur algorithm

May 28: Lab 3. Interpreting Brainfuck

Preliminaries

Go ahead and fetch and merge from me (don’t forget to commit your work first):

cd ~/COMP603-2015
git fetch upstream
# Unable to merge? Stage and commit your changes
git merge upstream/master
git merge upstream/lab3
# Have a CONFLICT?
git mergetool

Do you have Visual Studio or Code Blocks or XCode installed? The starter code, src/brainfuck.cpp, is in C++.

Goal

Write an interpreter for Brainfuck (this builds on Lab 2).

Hints

The program works when it prints out Hello World! for helloworld.bf

Study in-class/AST.java to get an idea of what you’ll need to do.

Or, try a program that takes in input, such as echo.bf. If you type in HELLO (and press Enter), it will print: HELLO.

./brainfuck.exe echo.bf
HELLO
HELLO

May 27: Tree traversal with visitors

Visits?!? Well, that’d indicate visitors.

Pull from upstream and study in-class/AST.java. Play code golf.

May 22: Moar hints & stuff

See updated lab description and hints below.

Also, your favorite language sucks, and here’s why.

JavaScript and Ruby
PHP

May 21: Lab 2. Parsing Brainfuck

Preliminaries

Go ahead and pull from me:

cd ~/COMP603-2015
git pull upstream master

Do you have Visual Studio or Code Blocks or XCode installed? The starter code, src/brainfuck.cpp, is in C++.

Goal

Modify src/brainfuck.cpp to parse Brainfuck using recursive descent. Brainfuck’s LL(1) grammar is:

Program -> Sequence

Sequence -> Command Sequence
Sequence -> Loop Sequence
Sequence -> any other character, ignore (treat as a comment)
Sequence -> "" (empty string)

Command -> '+' | '-' | '<' | '>' | ',' | '.'

Loop -> '[' Sequence ']'

Brainfuck in EBNF is:

Program -> Sequence

Sequence -> ( Command | Loop | Comment ) *

Command -> '+' | '-' | '<' | '>' | ',' | '.'

Loop -> '[' Sequence ']'

Comment -> any character other than '+' | '-' | '<' | '>' | ',' | '.' | '[' | ']'

Hints

The parser will probably be no longer than 20-30 lines; the solution is shorter than the problem statement. To read characters in a loop, while(file >> c) { … }

If your `C` is rusty, see the http://www.cplusplus.com/reference/[C Reference].

Write the recursive descent parser using any of these strategies:

Write parse recursively.
Use mutually recursive functions as done in in-class/RecursiveDescent.java. For each nonterminal in the grammar, write a function with the name of the nonterminal. Peek at the next character and figure out which production (rule) to apply based on the first and/or follow sets.
Maintain an explicit stack of nodes inside the existing parse function.
Use an implicit stack by modifying Node to include a pointer to a parent Node.

Note	Your parser cannot avoid using recursion or a stack (implicit or explicit). Don’t even.

You are done if your program builds a tree structure correctly. You need to place child nodes into the appropriate Container. This means Program at the top-level, and inside a new Loop in the appropriate spots.

To check your implementation, use the is-lab2-done.sh script, or compare program output with input. The program traverses the tree your parser built and prints it out with the Printer visitor. If the program shows any discrepancy between the program output and input, it means your parser formed the tree improperly. Of course, printing out the input file without forming a tree fools the script, but nobody else.

cd ~/COMP603-2015
cd src
g++ -o brainfuck.exe brainfuck.cpp
brainfuck.exe helloworld.bf
chmod +x is-lab2-done.sh
./is-lab2-done.sh

Deadline

Please complete before the next lab period. In the next lab, we’ll write an interpreter for Brainfuck.

Modify src/brainfuck.cpp in place, or replace it with your version.

cd ~/COMP603-2015
git commit -am "Lab 2 done."
git push origin master

May 20: Shift-reduce parsing LR(k) grammars

LR(k) grammars

LR(k) means Left to right, Rightmost derivation, with k tokens of lookahead.

LR(k) grammars are a subset of the context-free grammars, and a proper superset of the LL(k) grammars (the LL(k) grammars are a proper subset of the LR(k) grammars). For a grammar to be LR(k):

It must be unambiguous

LR(k) grammars can be parsed using 'shift-reduce'.

Shift-reduce parsing

Shift-reduce parsing is also known as bottom up parsing, because the parser works from the terminals up to the starting nonterminal. A shift-reduce parser shifts terminals onto a stack, and reduces the stack to a nonterminal when the stack matches the right hand side of a production (rule). Programmers rarely write shift-reduce parsers by hand, and use parser generators or parser combinators instead.

May 18: Recursive descent parsing LL(k) grammars

Pull from me.

cd ~/COMP603-2015
git pull upstream master
# Windows
start responses/may-18.txt
# Mac
open -e responses/may-18.txt

Open responses/may-18.txt in your local repository. Modify the file to answer the questions.

git commit -am "I got this."
git push origin master

First and follow sets

First set: the set of terminals (excluding empty string) that can appear first in any derivation of a nonterminal.
Follow set: the set of terminals (ecluding empty string) that can appear first after derivation of a nonterminal.

LL(k) grammars

LL(k) means parse from Left to right, Leftmost derivation, with at most k tokens of lookahead.

LL(k) grammars are a subset of the context-free grammars. For a grammar to be LL(k):

The first and follow sets for each nonterminal must be disjoint
It must be unambiguous
No left-recursion is allowed
No common prefixes on the right hand side are allowed

LL(k) grammars can be parsed using 'recursive descent'.

Recursive descent parsing

Recursive descent parsing is also known as top-down parsing, because the parse starts from the starting nonterminal. Each nonterminal is a function, and the first and follow sets determine which production (rule) to choose. See in-class/RecursiveDescent.java for an example recursive descent parser.

May 15: Derivations and parsing

Challenge: What’s the parse tree for int a = 5; using the C Grammar? Hint: It’s a declaration.

May 14: Lab 1: Read source

Lab 1

Do this individually, or in pairs.

Note	If working in a pair, run `./main.sh` from your repo. Log in and click on the added collaborator link. Then, go to the next page and copy the command line instructions.

Choose a single compiler implementation to review (suggestions welcome!)
Identify which files/functions are responsible for each phase in the compiler source (scan/lex/tokenize, parse, AST, optimization, code generation).
What was the most ridiculous thing you found? (funny comments? awful code?)
Take notes along the way (if you find something that’s unrelated to a compiler phase, try to infer what it’s doing).
Write up your findings in a short document and post to your repository (no more than two pages, please). For example:
```
git add findings.txt
git commit -m "Lab 1 findings."
git push origin master
```

Try to get this done today.

May 13: Chomsky, Derivatives, Compilers and Visitors, oh my!

Chomsky again

The Chomsky hierarchy is a containment hierarchy of languages. Restrictions placed on grammar production rules (or the underlying automaton) distinguish among language categories.

Language category	Restrictions on grammar productions	Equivalent automaton
Recursively-enumerable	None. Sequences of terminals and non-terminals may derive sequences of terminals and nonterminals.	Finite automaton with infinite tape (Turing machine)
Context-sensitive	The same context (terminals or nonterminals) surrounds both sides of the nonterminal on the left, and the derivation on the right.	Finite automaton with finite tape (Linearly-bounded Turing machine)
Context-free	A nonterminal derives sequences of terminals and nonterminals.	Finite automaton with a stack (Pushdown automaton)
LR	Context-free but forbids ambiguity.	Shift-reduce (bottom up) parser
LL	Context-free, the first and follow sets are disjoint, and forbids: ambiguity, left-recursion, and common prefixes.	Recursive descent (top down) parser
Regular	A nonterminal may derive either terminals followed by a single nonterminal, or the empty string.	Finite automaton
Finite	A nonterminal may derive terminals or the empty string.	Finite automaton without cycles.

Derivatives

A discussion of derivatives and parsing with them.

Compilers

Compilers consist of these 'phases':

Phase	Description	Input	Output
Scan / Tokenize / Lexical analysis	Split source code into small chunks (tokens) such as identifiers, reserved words, literals, operators, etc.	Source code	Token stream
Parse	Check the syntax of the source code	Token stream	Parse tree
Translate	Translate low level syntax into high-level abstract syntax tree	Parse tree	Abstract syntax tree, symbol table
Optimize	Improve performance or structure	Abstract syntax tree, symbol table	Abstract synatx tree, symbol table
Generate code	Traverse the AST to generate code.	Abstract syntax tree, symbol table	Target code

The front-end of a compiler consists of scanning and parsing; the back-end consists of translation, optimization and code generation.

Visitors

Visitors visit (traverse) nodes in a tree to do some computation, without mixing computation into the nodes themselves.

Challenge

Challenge: What’s the parse tree for int a = 5; using the C Grammar?

May 11: Regexes and Grammars

Cheat at today’s crossword puzzle, the easy way with regexes!

cd ~/COMP603-2015
git pull upstream master
grep -E "^regex-goes-here$" american-english.txt

A regular expression (regex) defines a language with these primitives and operators.

Name	Notation	Meaning
Primitives		Regular expression building block.
Empty Set	{}	Reject everything.
Empty String	""	Match the empty string.
Symbol	`a`	Match a single character.
Operator		Make a new regex from existing regexes.
Sequence	`ab`	Match regex `a` followed by regex `b`.
Alternation	`a\|b`	Match regex `a` or match regex `b`, but not both.
Kleene Star	`a*`	Match regex `a` zero or more times {"",`a`,`aa`,`aaa`,…}

The primitives and operators above are complete: we can define other regular expression operators in terms of them. For example, a? optionally matches a; a? = a|"". Another example: a+ matches a 1 or more times; a+ = a*a.

Trivially, finite languages are regular:

finite language:  {"hello","cruel","world"}
equivalent regex: hello|cruel|world

Since regular languages can be infinite, they encompass the finite languages.

.* (Matches everything)

Regular languages can’t express everything; for example, they cannot check matching brackets in the general case. Hence, the other classes of languages.

The Chomsky hierarchy is a containment hierarchy of languages. What distinguishes one language category from another is restrictions placed on grammars or the underlying automaton.

A grammar consists of a finite set of nonterminals (variables), a starting nonterminal, terminals (literals, words or symbols), and productions (rules) that map among terminals and nonterminals. Grammars define languages: they generate the set of strings in the language and test membership of a string in the language.

The example grammar below defines a small subset of English, with an example sentence. The example grammar is context-free because the left side of each arrow is a nonterminal.

May 8: Introduction

Note	Please read Chapters 1, 2 and 3 (Pages 6-16) or Chapters 1, 2 and 3 of the Crafting a Compiler textbook by next week. If this is overwhelming, read the first sentence of each paragraph, then skip subsequent sentences if it made sense, otherwise read on. See this list for other free books.

What is the difference between a set, a bag, and a sequence?

These are all collections.

A set is unordered and has no duplicates (no repeated values).

{ "hello", "world" } == { "world", "hello" }

A bag is unordered and allows duplicates (repeated values).

{ "buffalo", "my", "buffalo" } == { "my", "buffalo", "buffalo" }

A sequence is ordered and allows duplicates.

[ "hello", "cruel", "world" ] != [ "cruel", "world", "hello" ]

An ordered set is ordered and has no duplicates.

To summarize:

What is a language in terms of sets and sequences?

English subset

{
    "This is a sentence in English.",
    "This is another sentence in English."
}

Espanol subseto?

{
    "Yo quiero Taco Bell",
    "Donde esta el bano?"
}

An alphabet is a set of symbols (e.g., char). A string is a sequence of symbols chosen from some alphabet.

Languages are (possibly infinite) sets of strings. A grammar constructs a language; regular expressions construct regular languages.

What is a compiler? Name some.

A compiler transforms source language into a target language.

javac, gcc, clang, etc.

What is an interpreter? How does that differ from a compiler?

An interpreter accepts code and runtime data and runs with it. We’re not generating code.

May 7: Git setup and prequiz

Install Git and frontends

Windows

Install Git Extensions

Note	Install MsysGit, Install KDiff, and choose OpenSSH (not PuTTY); otherwise, stick to the default settings.

Mac OS X

Install GitX-dev, then Install XCode developer tools which ships with git (recommended) or install git from here.

Linux

Install git using your package manager. QGit, a git frontend may also be available for your distribution.

Note	Don’t forget to use sudo with your package manager.

Run starterupper

Starterupper sets up git and project hosting for this class; it is safe to run even if you already have git and SSH keys set up on your machine.

Open Git Bash (Windows) or Terminal (Linux, Mac OS X) and paste in the command below. Press Insert to paste in Git Bash.

curl https://raw.githubusercontent.com/lawrancej/COMP603-2015/master/main.sh | bash

Complete the prequiz

Open up prequiz.adoc in your favorite text editor (it is in your local git repository).

Warning

Do not use Notepad or Word. Use a real text editor. Suggestions: Notepad++ (Windows), Atom, or Sublime.

Then, save your changes and submit your work to your repository.

cd ~/COMP603-2015
# The easy way
git gui &
# The leet way
git add .
git commit -m "Finished prequiz"
git push -u --all origin

May 6: Introductions

Welcome to Compiler Design!

Icebreaker:

What is your name?
Why is Computer Science your major?
Tell us something nobody else in the room knows about you.

Files

README.adoc

Latest commit

History

README.adoc

File metadata and controls

Compiler Design

July 13: Status update

July 10: Random stuff

Parsing hints:

July 8: Microoptimizations and more project stuff

July 6: Linkers and project stuff

Linkers

My own crappy project

Project stuff

June 29: Off-site

June 26: Cool things on the internet

June 25: Lab 6. Team and project selection, repo setup

Get some friends (solo projects are sad projects)

Get some ideas (and discuss them)

Setup a repo (even if you have no friends or ideas)

June 24: Canceled :-(

June 22: Post-midterm review

June 19: Group project kick-off

June 18: Midterm

June 17: Midterm review?

June 15: Midterm review

June 12: Midterm review

June 11: Lab 5. Finish up prior labs, and try this out for kicks

Preliminaries

Lab (choose your own adventure)

June 10: Midterm practice?

June 8: SSA and GC

Single static assignment (SSA) Form

Garbage collection

Reference counting

Tracing (Mark sweep) garbage collection

June 5: Symbol tables, type checking

Symbol tables

Type checking

Memory

June 4: Lab 4. Optimize Brainfuck

Hints

June 3: Optimizations

June 1: Let’s write a compiler (sorta)

Announcements

Preliminaries

Activity

May 29: Parsing frameworks

Parser generators

Parser combinators

Inferring grammars from sequences

May 28: Lab 3. Interpreting Brainfuck

Preliminaries

Goal

Hints

May 27: Tree traversal with visitors

May 22: Moar hints & stuff

May 21: Lab 2. Parsing Brainfuck

Preliminaries

Goal

Hints

Deadline

May 20: Shift-reduce parsing LR(k) grammars

LR(k) grammars

Shift-reduce parsing

May 18: Recursive descent parsing LL(k) grammars

First and follow sets

LL(k) grammars

Recursive descent parsing

May 15: Derivations and parsing

May 14: Lab 1: Read source

Lab 1

Further reading

May 13: Chomsky, Derivatives, Compilers and Visitors, oh my!

Chomsky again

Derivatives

Compilers

Visitors

Challenge