Introduction
Chapter 1 introduces you to the Patina programming language, for which you will be writing your own compiler!
Patina is an imperative language whose syntax is very similar to the Rust programming language. We designed it so that it is big enough to let you write many interesting programs (e.g. your favorite sorting algorithm), but small enough that a compiler for Patina can be implemented in a quarter-long course.
Chapter 2 describes the language more formally, specifying the syntax, dynamic semantics, and static semantics of each language construct.
Chapter 3 walks you through setting up OCaml on your computer, which you will need for the programming assignments, described in Chapter 4. The assignments build on each other, and in the end you will have a complete, optimizing compiler that can translate any valid Patina program down to x86 assembly, which your computer will be able to execute.
Overview
This chapter gives an informal overview of the Patina language. More precise specifications can be found in the next chapter.
A Patina program consists of a list of function definitions, one of is a function called main
that acts as the entry point of the program:
fn my_function(<arg_name>: <arg_type>) -> <return_type> {
... // body of my_function
}
... // more function definitions
fn main() -> unit {
... // body of main
}
Patina is an expression-oriented language. This means that
- Statements (like those in Java or C) don't have a distinguished status in Patina. Instead, everything is an expression; statements are just expressions that don't produce values, and they can be used anywhere an expression can be used.
- The body of each function is simply an expression (i.e. there are no explicit
return
statements like in Java or C). Executing a function simply means evaluating the expression that is the body of the function.
Let's see what kinds of expressions we can write in Patina.
Integer Expressions
The most basic expressions in Patina are integer constants, and they can be composed into larger expressions using binary arithmetic operators +
, -
, *
and /
. For example,
(1 + 2) / 3 * 4
is a valid Patina expression with value 4
. Negative integers are written in the usual way, e.g. -2
.
Integer expressions have type int
.
Boolean Expressions
The most basic boolean expressions are boolean constants true
and false
. More complex boolean expressions can be constructed in two ways:
- by comparing two integer expressions using the comparison operators:
==
(equal),!=
(not equal),<
(less than),>
(greater than),<=
(less than or equal to) and>=
(greater than or equal to), - by composing smaller boolean expressions using the boolean operators:
!
(not),&&
(and),||
(or).
For instance, the following expression evaluates to true
:
(3 > 4 && (1 == 1)) || !false
Boolean expressions have type bool
.
Branching
Patina supports if-then-else expressions, which evaluates to the then-expression when the condition expression evaluates to true
or the else-expression otherwise. For example, the expression
if 0 <= 1 then {
2 + 3
} else {
3 * 4
}
evaluates to 5
.1
Note that it is possible to simulate "else if" clauses by nesting multiple if-then-else. For example,
if 0 == 1 then {
3
} else {
if 0 == 2 then {
5
} else {
7
}
}
evaluates to 7
.
You may have noticed the extra pair of curly braces that enclose the then
and the else
branch. They are not just there for readability; in fact, { ... }
is another kind of expression in Patina, called sequence expressions. You will learn more about them in the next section.
For those of you familiar with C++ (or Java), Patina's if-then-else expressions correspond to C++'s conditional expressions. For example, the if-then-else expression here would translate to 0<=1 ? (2+3) : (3*4)
in C++. Importantly, Patina's if-then-else expressions are different from C++'s if-statements. Statements don't produce values, but expressions do.
Unit expressions
Patina expressions don't always need to evaluate to an int
or bool
value. For some expressions, their only purpose is to have some side effect upon the execution of the program. Printing something to standard output is among one of those scenarios. In Patina, evaluating the expression
print_int(8)
has the effect of printing the integer 8
to the standard output, but the expression itself doesn't evaluate to any integer value (i.e., it is meaningless to write print_int(8) + 2
).
In Patina, expressions like this have a special type called unit
. An expression of unit
type evaluates to a special value called the unit value, written as ()
. Note that the unit value ()
is the only value that inhabits type unit
. Compare it with type int
, which harbors many values, such as 0
, -3
, 42
, 1024
, ...
Sequencing
It is often useful to string together multiple unit expressions with side effects, and evaluate them one by one. Say we would like to print two integers. In Patina, we can express this using a sequence expression:
{
print_int(8);
print_int(9)
}
This should be understood as a single overall expression e
written as { e1 ; e2 }
. The expression e
contains two sub-expressions, namely, print_int(8)
as e1
and print_int(9)
as e2
.
To evaluate a sequence expression { e1; e2; ...; en }
, we evaluate each ei
sequentially, discarding the values of all expressions except for the last expression en
.
The value of en
will become the value of the overall sequence expression.
So in the previous example, the value of the overall sequence expression is the value of print_int(9)
, which is ()
, the unit value. The value of print_int(8)
is suppressed.1
A sequence has to have at least one expression, but it may very well have just a single expression, as you have encountered in a previous if-then-else expression:
if 0 <= 1 then {
2 + 3
} else {
3 * 4
}
Here, the then
branch contains the sequence { 2 + 3 }
consisting of a single expression 2 + 3
(and similarly with the else
branch).
Sequences also appear as while expressions's body and functions' body, both of which we will introduce shortly.
Variable Declaration
Sequence expressions not only are useful for specifying sequential computation with side effects, but they also delineate how long variables declared inside them should live. Let's first see how variables are declared in Patina.
Variable declaration is another kind of side effect in addition to printing to standard output. In Patina, variables are created using let-expressions:
let x: T = e
This expression will create a new variable x
of type T
, which is initialized with the value of expression e
. Then, x
becomes available ("in-scope") in subsequent expressions within the same sequence.
For example,
{
let x: int = 5;
let y: int = x + 1;
print_int(x * y)
}
will print out 30
, but
{
{
let x: int = 5;
let y: int = x + 1
};
print_int(x * y)
}
will result in a compile-time error due to the last line: variables x
and y
are only available inside the inner sequence, and are not available in the outer sequence.
Let-expressions themselves have type unit
.
Variable Assignments
In addition to reading the value of an variable, you might also want to mutate it. Patina has assignment expressions for this purpose. For example,
{
let x: int = -1;
x = x + 2;
print_int(x)
}
initializes a new integer variable x
with value -1
, increments the value of x
by 2
, and prints 1
to the screen.
While Expressions
Patina's while expressions are very similar to the while loops in other imperative languages like C or Java. A while expression has a condition and a body sequence:
while cond {
e1;
e2;
..;
en
}
To evaluate a while expression, we first evaluate its condition:
- If it is
true
, then we evaluate the body sequence repeatedly until the condition evaluates tofalse
. - Otherwise, the body sequence will not be evaluated.
For example,
{
let i: int = 0;
while i < 3 {
i = i + 1
};
print_int(i)
}
will print 3
to the standard output.
The condition should have type bool
, and the body have type unit
. The overall while expression will have type unit
.
For simplicity, Patina has no break
or continue
constructs.
For those of you familiar with Java or C, the semicolon has a different meaning here. In Java or C, a semicolon signals the end of a simple statement. In Patina, a semicolon is part of the syntax for sequence expressions, so they must appear together with a surrounding pair of curly braces. Moreover, a semicolon is always sandwiched between two expressions, and it means that "we don't care the value of the first expression, but please evaluate it before you evaluate the expression after the semicolon."
Arrays
Patina supports dynamically allocated, zero-indexed arrays of integers.
Array Allocation
We can create an array by calling the built-in function alloc(..)
with a desired length. All elements will be initialized to 0
. For example,
{
let len: int = 3;
let xs: [int] = alloc(len)
}
creates an array of length 3
and binds it to variable xs
.
Reading from an Array
Array elements can be accessed using array-access expressions, written as
xs[i]
where xs
is an array, and i
is an integer expression.
Writing to an Array Element
When an array access appears on the left-hand side of an assignment, the selected array element can be mutated, just like how the value of a variable is mutated. For example,
{
let singleton: [int] = alloc(1);
singleton[0] = 100
}
changes the 0-th element of singleton
from the default value of 0
to 100
.
Aliasing
Aliasing is permitted in Patina. Two array variables can simultaneously refer to the same array object:
{
let xs: [int] = alloc(1);
let ys: [int] = xs; // ys refers to the same array as xs
...
}
Arrays are passed by reference. So [int]
is like a pointer to integer (int *
) in C++. In the following example, both reverse_1
and reverse_2
reverses the input array. However, reverse_1
modifies the input array in place, while reverse_2
returns a new array that's the reverse of the input array.
fn reverse_1(xs: [int], len: int) -> unit {
let mid: int = len / 2;
let i: int = 0;
while i < mid {
xs[i] = xs[len - i - 1];
i = i + 1
}
}
fn reverse_2(xs: [int], len: int) -> [int] {
let ys: [int] = alloc(len);
let i: int = 0;
while i < len {
ys[i] = xs[len - i - 1];
i = i + 1
};
ys
}
Extent
Because you're not required to implement a garbage collector for your compiler, arrays will have infinite extent. In other words, an array cannot be destroyed once it has been allocated.
In the following example, even after the function f
has returned, the array referred to by xs
inside the body of f
continues to exist in heap memory. But we can't access the array again, because its only name xs
is no longer available outside the function body:
fn f() -> unit {
let xs: [int] = alloc(1); // xs refers to an array in heap memory
()
}
fn main() -> unit {
f();
... // can't refer to that array anymore, since xs is out-of-scope
}
Once we've created an array, unless we always remember its name (or one of its names), the array would become an unnameable identity forever lost in heap memory. This is what's called a memory leak, which plagues programs written in unmanaged languages like C and C++. You will learn about solutions to this problem in the second half of the course, although you won't be required to implement them 1.
Also, learn how Rust prevents resource leaks using a completely different approach. It's super cool!
Functions
In Patina, functions are the only creatures that are not expressions. They are declared at the top level, along with the mandatory main
function.
By convention, main
doesn't take in any argument, and always has return type unit
:
fn main() -> unit
Other functions can have any return type (unit
, bool
, int
, or [int]
), and their arguments also need to be annotated with types. The body of each function is a sequence expression whose type must match the declared return type.
Recursion
Functions can be recursive. This program will print out 55
, the tenth fibonacci number (indexing from zero).
fn fibonacci(x: int) -> int {
if x == 0 || x == 1 then {
x
} else {
fibonacci(x-1) + fibonacci(x-2)
}
}
fn main() -> unit {
print_int(fibonacci(10))
}
You can also define mutually recursive functions:
fn even(x: int) -> bool {
x == 0 || odd(x - 1)
}
fn odd(x: int) -> bool {
x == 1 || even(x - 1)
}
fn main() -> unit {
print_bool(odd(3))
}
This program will print out true
.
Built-in Functions
Patina also provides the following built-in functions, which can be called freely in Patina programs:
// print a boolean to stdout
fn print_bool(x: bool) -> unit
// print an integer to stdout
fn print_int(x: int) -> unit
// print an array of integers to stdout
fn print_arr(xs: [int], length: int) -> unit
// print the newline character '\n' to stdout
fn print_ln() -> unit
// allocate a new, zero-initialized array of specified length
fn alloc(length: int) -> [int]
For example, the following program
fn main() -> unit {
print_bool(!true);
print_ln();
print_int(0);
print_int(1);
print_ln();
let xs: [int] = alloc(3);
xs[0] = 3;
xs[1] = 2;
xs[2] = 1;
print_arr(xs, 3);
print_ln()
}
will print out
false
01
3 2 1
Reference Manual
Concrete Syntax
Program ::= Function+
Function+ ::= a non-empty list of "Function"
Function ::= fn ID ( Param* ) -> Type Sequence
Param* ::= a list of "ID : Type" separated by ","
Type ::= unit | bool | int | [int]
Sequence ::= { Expr+ }
Expr+ ::= a non-empty list of "Expr" separated by ";"
Expr ::= ()
| true | false
| NON_NEG_INT
| ID
| ( Expr )
| - Expr
| ! Expr
| Expr ⊕ Expr ⊕ ∈ { +, -, *, /, &&, ||, ==, !=, >=, >, <=, < }
| Sequence
| if Expr then Sequence else Sequence
| let ID : Type = Expr
| while Expr Sequence
| ID [ Expr ]
| ID = Expr
| ID [ Expr ] = Expr
| ID ( Expr* )
Expr* ::= a list of "Expr" separated by ","
ID = a letter or underscore, followed by any number of letters, numbers, or underscores
Precedence and Associativity
Unary operators have the highest precedence, followed by the arithmetic operators, integer relations, and logical operators, with their usual precedence. That is, multiplication and division have higher precedence than addition and subtraction, and logical conjunction has higher precedence over disjunction.
Operators associate to the left. Relations are non-associative.
Typing Rules
Changelog
None
Please view the typing rules here.
Setting up OCaml
This guide covers how to set up OCaml on your computer.
-
Installing
opam
walks you through how to installopam
, the package manager for OCaml. You can think of this as a gateway to all official and third-party OCaml packages. You will need to go throughopam
to install anything OCaml-related. -
Once you have
opam
, Installing OCaml walks you through setting up the core tools necessary to run any OCaml program. -
Interacting with OCaml code introduces common ways of interacting with OCaml code.
Installing opam
Adapted from this guide written by Bryan Tan.
opam
is a package manager for OCaml. You will later use it to install OCaml compilers, interpreters, and other third-party libraries needed for this course.
Installing on CSIL Machines
opam
is a little trickier to set up on CSIL. If you don't have an account yet, create one using this link.
Note: If you previously had an unsuccessful installation of
opam
, please runrm -rf ~/.opam
to clean up partial installation before retrying the steps below.
Log onto a CSIL machine (you can do it remotely). Once you're logged in, fire up a terminal if you don't have one open, and type the following command to download the opam
binary:
curl -LR 'https://github.com/ocaml/opam/releases/download/2.1.0/opam-2.1.0-x86_64-linux' -o opam
Then make sure the downloaded binary is executable, and move it to ~/bin/opam
:
chmod +x opam
mkdir -p ~/bin/
mv opam ~/bin/opam
Check to make sure it's on PATH
:
[junrui@csil-08 ~]$ opam --version
2.1.0
Lastly, run opam init
. This will take 20-40 minutes because it has to download a bunch of stuff. It will prompt you once or twice afterwards; you can safely respond with Y to make your life more convenient.
Installing for Non-CSIL Linux
Follow the steps here, under the "Binary distribution" section header.
Installing for macOS
If you have homebrew, just run
brew install gpatch
brew install opam
Otherwise, follow the same steps in "Installing for Linux" section.
Installing for Windows
Windows users are recommended to use CSIL instead, since OCaml doesn't work too well on Windows.
Installing OCaml
Once you have opam
, you can create a "switch" so that anything you installed for this course won't contaminate the global environment.
Inside your terminal, run
opam switch create cs160 ocaml-base-compiler.4.13.0
eval $(opam env)
This will create a new switch called cs160
. It will also compile the tools necessary to run OCaml programs, including a compiler (ocamlc
) and an interpreter (ocaml
).
Once the switch is created, you can play with the interpreter by running ocaml
in your terminal.
But we recommend using utop
as an alternative. To install it, run
opam install utop
opam
will install any additional dependencies required by utop
.
Interacting with OCaml Code
Interacting without Source Files
Simply run utop
on your terminal. You can enter OCaml expressions and let bindings.
Running an OCaml Program
Open your favorite code editor, and create a file called hello.ml
that contains the following lines:
let s = "hello"
let _ = print_endline s
Run utop hello.ml
to let utop
interpret your program from start to finish. You will see "hello" printed on your terminal.
Alternatively, you can run utop
with no arguments, and load your program dynamically with
#use "hello.ml";;
You will also see "hello" printed on your screen, but this time utop
won't exit. So you can continue to use the interpreter, with everything defined in hello.ml
now available for use. For instance, the variable s
containing the string "hello"
is still available, so you can continue to do
print_endline s;;
Building an OCaml Project with Multiple Files
You will need a build system, like dune, to manage OCaml programs that spread over more than one file. But for the course assignments we will set up everything for you. So no need to worry about this scenario!
Programming Assignments
CS160 Assignment 1: OCaml
Assignment due: Monday, October 18 11:59PM
Changelog
- Problem 3:
- For
Let (x, e)
, furtherLet
-bindings created insidee
will not be inherited.
- For
- Problem 4:
- For
While
's condition, treat the zero value as false, and non-zero values as true.
- For
- Problem 2:
- You don't need to worry about division by zero.
Div
is to be interpreted as integer division.
Introduction
In this assignment, you will get some practice with recursion and pattern matching by building a sequence of interpreters for increasingly large subsets of the Patina programming language.
There are 3 main parts to the assignment (Problems 1-3), as well as a bonus problem (Problem 4). For each problem, we'll provide a file containing some starter code. They can be found in the CS160 GitHub repo:
assoc.ml
(Problem 1)patina_arith.ml
(Problem 2)patina_let.ml
(Problem 3)patina_ref.ml
(Problem 4)
You will fill in the missing parts of the code that are marked with (* your code here *)
.
If you're stuck for more than 15 minutes on a problem, try to ask a general question on slack, or if you can't, direct message a TA.
Respect academic integrity; no sharing solutions, code or otherwise. You may use the internet, but not for finding solutions to these particular problems.
Submission and Grading
Please submit the following files to Gradescope:
assoc.ml
patina_arith.ml
patina_let.ml
patina_ref.ml
You are free to leave the assertions in your files uncommented, as long as they don't break your code.
Each test case is worth 1 point.
The test cases for the bonus problem will be available soon.
Problem 1 [★]
Goal of this exercise: Practice tuples, options, and recursion over lists.
In this problem, you will implement the association list data structure, which you may need for a later problem.
Association lists are the simplest kind of lookup tables. In OCaml, they are ('k * 'v) list
that map keys of type 'k
to values of type 'v
. To insert a key-value pair into an association list, we simply insert the key-value pair to the beginning of the list:
let insert (k: 'k) (v: 'v) (al: ('k * 'v) list) : ('k * 'v) list =
(k,v) :: al
That is, insert
takes a key and a value, as well as a list of keys and values, and inserts the key-value pair to the beginning of the list.
Note that the key's and value's types are not concrete, because the insert
function is designed to work over any two key value types (an example of polymorphism). However, the list that the key-value pair is being inserted into must be a list of the correct type.
You will implement the complementary lookup_opt
function. It looks like
let rec lookup_opt (k: 'k) (al: ('k * 'v) list) : 'v option =
match al with
| _ -> failwith "Not yet implemented" (* your code here *)
To query the value associated with a key k
in the list, we find the frontmost pair whose key matches k
, and we return the associated value.
However, our query k
might not be in our list. That's why the return type is 'v option
, instead of 'v
.
In the example below, we insert
("x", 1)
,("y", 2)
,("x", 3)
to the list al
sequentially. Because ("x", 3)
is more recent than ("x", 1)
, the lookup lookup_opt "x" al
returns the Some 3
, instead of Some 1
.
let al = insert "x" 3 (insert "y" 2 (insert "x" 1 []))
(* al is now [("x", 3), ("y", 2), ("x", 1)] *)
let _ = assert (lookup_opt "z" al = None)
let _ = assert (lookup_opt "y" al = Some 2)
let _ = assert (lookup_opt "x" al = Some 3)
Provide a recursive implementation for lookup_opt
. Your code may not call any OCaml library functions.
Hints:
- On
option
types. A value of an'a option
type can either beNone
, orSome
of some value of type'a
. Here is an example demonstrating theoption
type. If we want to divide integers without risking an exception, one solution is to use option types.let safe_divide (x: int) (y: int) : int option = if y = 0 then None else Some (x / y)
- On recursion. Recursion is simpler to use than loops in OCaml, and often better. You need to use
let rec
rather thanlet
to declare a recursive function. Other than that, a function can call itself just as any other function. Here is a recursive function that computes the Hailstone sequence:let rec hailstone n = if n = 1 then print_string "1 reached" else if n mod 2 = 1 then hailstone (3*n+1) else hailstone (n / 2)
- On pattern-matching. As seen in the course slides, the primary way that you interact with values is by case analysis. The syntax for case analysis is via
match .. with ..
which maps each possible case to the desired expression. All values that are part of the case need to be assigned to variables if you want to use them in the expression. For instance, the following function uses pattern matching to sum the first two elements of a list:let sum_of_first_2_elements (l: int list) : int = match l with | [] -> 0 | x :: [] -> x | x :: y :: _ -> x + y
Problem 2 [★★]
Goal of this exercise: Practice pattern matching and functions as first-class objects.
Patina\(^{arith}\) is a subset of Patina that only contains integer constants and binary expressions. Expressions will be represented in OCaml as abstract syntax trees as follows:
type binop = Add | Sub | Mul | Div
type expr = Const of int
| Binary of binop * expr * expr
Below are two Patina\(^{arith}\) expressions represented as expr
:
e1
represents the concrete expression "0 + 2 * 3"e2
represents the concrete expression "3 * 4 - 30 / 6"
let e1 =
Binary (
Add,
Const 0,
Binary (Mul, Const 2, Const 3))
let e2 =
Binary (
Sub,
Binary (Mul, Const 3, Const 4),
Binary (Div, Const 30, Const 6))
We provide you with a partially complete interpreter for Patina\(^{arith}\). The main interpreter function is interpret
, which evaluates a Patina\(^{arith}\) expression to an OCaml integer.
let rec interpret (e: expr) : int =
match e with
| Const n -> n
| Binary (op, e1, e2) ->
(interpret_op op) (interpret e1) (interpret e2)
and interpret_op (op: binop) : int -> int -> int =
match op with
| _ -> failwith "Not yet implemented" (* your code here *)
Here, interpret
calls a helper function, interpret_op
, which evaluates binary integer operators (binop
) to their interpretations as OCaml functions (int -> int -> int
). The binary operators have the usual interpretations, and Div
should be interpreted as integer division. For instance, interpret e1
should return 6
, and interpret e2
should return 7
.
Complete the definition for interpret_op
. You don't need to handle division by zero, since the test cases won't contain input expressions that cause division by zero errors.
Hint: You are likely to find anonymous functions helpful here.
As an example, the following variable
three_anonymous
is a tuple of 3 anonymous functions so you can see how they're created:let three_anonymous = ( (fun x -> if x then 3 else 6), (fun x -> fun y -> x ^ y), (fun (a,b,c) -> if a > b then a-b else c) )
The first takes a
bool
to anint
, the second concatenates twostring
s, and the third compares integers. You will notice that the second is an anonymous function returning an anonymous function, whereas the third takes all its arguments as a triple. The former format is called "curried" and the latter "uncurried". Be careful to use the right one.
Problem 3 [★★★★]
Goal of this exercise: Practice pattern matching and simulating "state changes" in a functional way.
Now let us extend Patina\(^{arith}\) into Patina\(^{let}\), which additionally supports variable bindings and sequences:
type binop = Add | Sub | Mul | Div
type expr = Const of int
| Binary of binop * expr * expr
| Id of string (* new *)
| Let of string * expr (* new *)
| Seq of expr list (* new *)
A Let
expression is a string * expr
pair, where the value of the expr
will be bound to the string
name. The name will available in subsequent expressions in the parent Seq
expression. Any binding created in the expr
will not be inherited. For example, the following expression
Seq [
Let ("x", 0);
(* Current environment: x |-> 0 *)
Let ("y", Let ("x", 1));
(* Current environment: y |-> 0, x |-> 0 *)
x
]
evaluates to 0, because the second binding of y
is created in the right-hand side of binding of x
, hence is ignored.
The evaluation rules for the newly added constructs are as follows:
- A
Let
expression evaluates to0
. - A
Seq
expression evaluates to the value of the last expression in the sequence. - An
Id
expression evaluates the value associated with the identifier string.
In the example below,
e1
represents the concrete Patina expression{ let x: int = 2; x * 3 }
,e2
represents the concrete Patina expression{ let x: int = -1; { let x: int = 2; x }; x }
let e1 =
Seq [
Let ("x", Const 2);
Binary (Mul, Id "x", Const 3)
]
let e2 =
Seq [
Let ("x", Const (-1));
Seq [ Let ("x", Const 2); Id "x" ];
Id "x"
]
let _ = assert (interpret e1 = 6)
let _ = assert (interpret e2 = (-1))
e1
evaluates to 6
, because the let-expression binds "x"
to Const 2
, and the binary expression Binary (Mul, Id "x", Const 3)
evaluates to 6
when Id "x"
is replaced with Const 2
.
e2
evaluates to -1
, because the last "x"
refers to the variable bound in the outer sequence.
Extend your interpreter in Problem 2 to support the new language constructs. Your interpret
will have the following type:
let rec interpret (e: expr) : int =
failwith "Not yet implemented" (* your code here *)
You may assume that the test cases won't contain semantic or runtime errors, such as reference to unbound variables, or empty sequences.
Hints:
You may want to use some kind of environment (recall what you did in Problem 1) to keep track of the values of variables that are in scope:
type environment = (string * int) list
interpret
itself can be non-recursive, and can call another helper function that actually evaluates the expressions recursively. The helper function may look like:let rec interpret' (e: expr) (env: environment) : (environment * int) = match e with ...
Compared to
interpret
, the functioninterpret'
additionally takes in anenvironment
, and additionally returns anenvironment
augmented byLet
expressions 1.Be very careful how
Seq
and your environment interact. If you are unsure, the Variable Declaration section in the Patina language overview contains more examples and explanations.
(Bonus) Problem 4 [★★★]
Extend your interpreter to support Patina\(^{ref}\), which further supports assignments and while loops:
type binop = Add | Sub | Mul | Div
type expr = Const of int
| Binary of binop * expr * expr
| Id of string
| Let of string * expr
| Seq of expr list
| Assign of string * expr (* new *)
| While of expr * expr (* new *)
Both Assign
and While
evaluate to 0
. A While
expression is an expr * expr
pair, where the first item of the pair is the condition, and the second item is the loop body.
Because we don't model booleans here (for simplicity), simply treat the zero value as false
, and non-zero values as true
.
For more information about the semantics of assignment and while expressions, refer to the relevant parts in the section on Unit Expressions in the Patina language overview.
Hint: You may want to use
ref
cells for your environment:type environment = (string * int ref) list
This helper function is an example of what's called a state monad, which is a functional way of simulating "state changes" without using mutable variables. (The linked article is for people learning a different functional language called Haskell, but the syntax involved in that particular discussion is quite similar to OCaml's syntax.)
CS160 Assignment 2: Lexing and Parsing
Assignment due: Friday, October 29 11:59PM
Changelog
- The regular expression for
ID
should match underscores. - Due date has been extended to Friday.
Introduction
In this assignment, you will first write some toy programs in the Patina language to get more familiar with the language's syntax. Then, you will build a lexer and a parser to parse well-formed Patina programs into abstract syntax trees.
Instructions
-
Download the starter code here.
-
Unlike Assignment 1, the starter code is no longer individual files, but a self-contained
dune
project. You will need to installdune
throughopam
:opam install dune
There is guide on how to install
opam
on your machine or CSIL. If you encounter any difficulty installingopam
ordune
, please let us know on Slack. -
The only files you need to modify are
scanner.mll
andparser.mly
. -
A driver program (
patina.ml
) is provided to help you test your lexer/parser. It requiresppx_deriving.show
for pretty printing, which you can install withopam install ppx_deriving
. Once you have that, you can run the driver on a Patina source file withdune exec ./patina.exe -- <filename>
Submission and Grading
Submit your scanner.mll
and parser.mly
to gradescope.
Please make sure that your code compiles before submitting them.
There are 21 tests in total, 16 of which are public. You can find the public tests here.
Part 1
You will write some simple programs in the Patina language. We'll provide you with a prototype interpreter for you to validate your programs. Be aware that the prototype is quite crappy; by the end of this course will have a compiler that's much better than ours. Use patina-darwin
if your machine uses macOS, and patina-linux
if your machine uses Linux (including CSIL). Run it with ./patina-<your-os> -i <patina-source-file>
.
For each of these, we recommend looking at the syntax reference of Patina, as well as the overview of Patina in the first section of the page. They should contain all the information that is necessary to complete the assignments. For anything that's unclear after that, ask on Slack!
Program 1
Write a simple function to practice the oddities of Patina syntax. This function should take an int as argument, create an array with length equal to the integer, fill it with 7s, and then return it. The main function should call this function with 3 as its input and print out the 2nd value in the result.
Program 2
Let's follow up on the section - write a function that takes in an integer and returns whether or not the integer is a perfect number. You can do so by creating a list of all the factors of the input integer, and then sums them together to check whether a number is perfect or not, returning true
if it is and false
otherwise.
In your main
function, use a while loop to print out perfect numbers up to 1000, each on a new line. For example, if your source file is named perfect.pt
, then running ./patina-<your-os> -i perfect.pt
should output
6
28
496
Part 2
The first step of a compiler is splitting the string of the whole code into pieces which represent all of the important parts. Since Patina has no syntax for comments, this should be straightforward.
Ocaml has the very useful tool for doing lexing of ocamllex
, which takes a definition of a mapping from token strings to tokens, and produces an automaton that can actually do the lexing. This takes advantage of the insight you've learned in class about how simple enough languages can be parsed with simple automata.
Each of the lines of code will look something like:
| '/' { DIV }
Each of these tokens is atomic for the purpose of parsing - it can't be divided further without losing some utility. For example, you don't want to make each digit in an integer a separate token, because you'd have to put them all together when parsing any case where they're used. On the other hand, you wouldn't want to count "(2" as a token, because that parenthesis is going to work just as any other parenthesis in the problem.
Fill out the rows of your template scanner.mll
file to write a lexer for Patina's full syntax.
Part 3: Parser
Parsing is the central work of this assignment, and the purpose of good lexing is to make parsing easier.
ocamlyacc
is a tool which automatically generates a parser for a language if you define how it works. You will use it to parse textual file containing Patina programs into their abstract syntax tree representation.
Each line of the parser should be inside of a block defining a non-terminal or terminal symbol, and should include both the symbols that make it up on the left, as well as OCaml code (using the constructors of the AST) which represents the value on the right. This value is not the result of interpretation of the code, so no addition or other calculation should be done.
For example, to say that an expression formed by two sub-expressions separated by the DIV
token, we write
expr:
...
| expr DIV expr { Binary(Div, $1,$3) }
We have defined an abstract syntax tree (AST) type for you in ast.ml
, but it's up to you do define the intermediate nodes of parsing. You can use the patina concrete reference not only as a reference of how Patina code looks, but also as hints for what parsing structure to use.
Modify the template parser.mly
file to write a parser for Patina's full syntax. Specifically, you will need to add three kinds of stuff:
- Since the parser is connected to the lexer your wrote for Part 2, you need to declare in the parser file the tokens you defined in the lexer:
%token // TODO: Your tokens here
- Define your grammar rules:
// TODO: Your rules here E.g.
- Your grammar will likely be ambiguous (i.e. there will be shift/reduce or even reduce/reduce conflicts). You can resolve a lot of ambiguities by simply specifying the precedence and associativity of the elements in your grammar:
// TODO: Your associativity rules here
CS160 Assignment 3: Type Checking
Assignment due: Monday, November 15 11:59PM
In this assignment, you will implement a type checker for your Patina compiler.
Instructions
- Download the starter code here.
- Make sure you have installed
dune
andppx_deriving
viaopam
, as required in AS2. You don't need to install anything new for this assignment. - Replace
scanner.mll
andparser.mly
with your own implementation from AS2. - Complete the type checker in
typecheck.ml
(described in greater details later). - As in the previous assignment, we provide you with a driver program (
patina.ml
) that takes in Patina source files and invokes your parser and type checker on them. Run the driver program withdune exec ./patina.exe -- <patina-source-file>
Submission and Grading
We will update this section once the autograder is ready.
What You Need to Implement
If you are feeling adventurous, you can start with an empty
typecheck.ml
file and build your type checker from scratch. The only requirement is that the file should export a function calledcheck_prog
with the following signature:val check_prog : Ast.prog -> unit
This function takes in an
Ast.prog
. It returns unit if there's no type error, and raisesError.TypeError
otherwise.
You will implement the typing rules described in the reference manual, as well as a few more semantic checks.
The typecheck.ml
file contains a skeleton of the type checking algorithm. On a high level, it is quite similar to the interpret'
function you wrote in AS1:
- The main type checking function,
check
, takes in an expression and some environment, and outputs a pair consisting of a new environment and someresult
. - Previously, the
result
type wasint
, because your interpreter was evaluating the input expression to an integer. Now theresult
type isAst.typ
(which can be eitherTUnit
,TBool
,TInt
, orTArr
), because the type checker computes the type of the input expression (if the expression is well-typed). - Previously, the
environment
type was(string * int) list
, which maps identifiers to their integer values1. Now the environment is calledtenv
, which stands for typing environment. This is the \(\Gamma\) you have encountered in the typing rules, which maps identifiers to their declared types. Thecheck
function also takes in a function environment, which is the \(\Delta\) in the typing rules.
The structure of check
is also quite similar to interpret'
. It traverses the AST in a post-order/bottom-up manner, by pattern-matching on the input expression, recursively type-checking the sub-expressions (if any), and aggregating the types of the sub-expressions. You can see a concrete example in the partially written cases we've provided. For example, in the case of unary Not
expressions,
| Unary (Not, e) ->
let te = type_of e in
expect e TBool te;
return TBool
we first recursively type check the sub-expressions e
. Then we assert that e
have type TBool
. Finally, we return the type of the overall unary expression, which is also TBool
.
However, it is not always the case that an expression is well-typed. What should check
return in those cases? Clearly, we can't return a type as we normally would, because that would mean that the ill-typed expression has a valid type. Instead, the way we handle ill-typed expressions in this assignment is through exceptions. You have probably encountered exceptions in other languages. In OCaml, to raise an exception exn
, we simply say raise exn
. To handle exception exn
that might arise during the evaluation of an expression e
, we write
try
e
with
| exn -> (* exception handling code *)
If your type checker encounters a type error, it should raise a TypeError
, which is defined in error.ml
. However, you don't need to raise exceptions explicitly using raise
(and we recommend against doing that). Instead, use the helper functions error.ml
, who will raise the appropriate exceptions on your behalf.
Fill in the missing parts that are marked with hole ()
in typecheck.ml
.
In addition to the typing rules described in the reference manual, your type checker should also perform a few other semantic checks. For example, a program should only have one main
function, which takes no argument and returns a unit. You can get some hints as to what kinds of semantic checks need to be performed, by looking at the helper functions in error.ml
.
You can think of the environment in AS1 as a lazy substitution table.
CS160 Assignment 4: Code Generation
Assignment due: Monday, November 29 11:59PM
Your compiler will generate x86 assembly code for Patina programs.
Submission and Grading
TBF
Introduction
For this assignment, you will be tackling the most central part of a compiler: the generation of code of a lower level language. In this case, we will be using x86 as the target language. See the Resources section
This would normally be far more difficult than each of the other individual stages of the compiler. However, we have introduced a number of simplifications to make it doable.
-
Function calls are disallowed, and any functions besides main. In essence, the patina files to be compiled will represent a simple script. (Functions will be included in bonus part #1.)
-
Arrays are no longer included in the language specification. This allows all data to be stored on the stack as supposed to on the heap. (Arrays will be included in bonus part #2.)
-
This matters for performance but not to correctness, no register allocation algorithm needs to be written. All data will be stored inside a single frame of the stack.
Overview of approach
However, this still leaves a fair amount of complexity. For example, the following function calculating fibonacci numbers is still allowed.
fn main() -> unit {
let fibn : int = 10;
let firstval : int = 1;
let secondval : int = 1;
if (fibn < 0) then {
let res : int = 0/0
} else {
while (fibn > 0){
let sum : int = firstval + secondval;
firstval = secondval;
secondval = sum;
fibn = fibn - 1
};
let res : int = secondval
}
}
As you can see, this still contains the challenges for implementation of
- jumps required for while loops and if expressions
- binop expressions which when compiled require multiple lines of assembly.
For the former, you will be adding labels into the assembly to allow appropriate jumps. For example, the expression if false then {2} else {1} + 1
should be translated into something equivalent to the following (ignore the binop compilation there for now).
cmp $1, $0
je L1
mov $1, %rax
jmp L2
L1:
mov $2, %rax
L2:
add $1, %rax
For the latter, you will be reading variables from the stack into some fixed registers, pushing them onto the stack temporarily to allow for arbitrary depth arithmetic trees, and then writing them back there. For example, the expression x = x * 4
should be translated into something equivalent to the following (which is inefficient, but you only need correctness for now.)
mov 16(%rbp), %rax
push %rax
mov $4, %rax
mov %rax, %rbx
pop %rax,
add %rbx, %rax
mov %rax, 16(%rbp)
What you need to implement
In each of these cases, you can test your code (download the files from the github repository) by running
./patinac {yourfile.pt}; ./prog
.
The easiest way to test is likely to be testing purely on expressions. In order to do this you must designate expr as a start symbol in your parser, just as prog is already.
Inside patinac, you can delete or keep the -e flag to determine whether the file is being parsed as a program with a single function in it or as just an expression, respectively.
Mandatory Part - Loops, Conditionals, and Variables
Fill in the holes in the provided code, so that the compile
function correctly converts an AST from the following subset of Patina to x86 assembly.
type expr =
| Const of const
| Id of string
| Let of string * typ * expr
| Assign of string * expr
| Unary of unop * expr
| Binary of binop * expr * expr
| Ite of expr * expr * expr
| Seq of expr list
| While of expr * expr
Bonus Part 1 - Functions
TBF
Bonus Part 2 - Arrays
TBF
Resources
x86 Assembly
x86 Calling Convention (cdecl
)
- Wikipedia page on x86 calling conventions
CS160 Assignment 5: Optimizations
Assignment due: Friday, December 10th 11:59PM
Submission and Grading
TBF
What you need to implement
In this assignment, you will implement the optimization phase for your compiler. Optimizations are commonly performed on intermediate representations. However, for simplicity you will directly optimize on the following subset of Patina AST, which excludes variable mutation, arrays, and while loops:
type expr =
| Const of const
| Id of string
| Let of string * typ * expr
| Unary of unop * expr
| Binary of binop * expr * expr
| Ite of expr * expr * expr
| Seq of expr list
| Call of string * expr list
On minimum, you should implement constant folding and propagation:
- Constant folding partially evaluates an expression as much as it can. For example, the nested binary expression
100 * 100 * 100
will be replaced with a single constant,1000000
. - Constant propagation replaces references to constant variables with their actual values. For instance, the sequence
{ let x: int = 1; print_int(x) }
will be transformed into{ let x: int = 1; print_int(1) }
.
Fill in the body of function fold_constant
in optimize.ml
. As always, you will likely need some kind of environment (to do constant propagation). Think about what kind of environment might be good. Then, replace bottom
with a type of your choice on the following line in optimize.ml
:
type env = (string * bottom) list
and perform fold_constant
using this environment.
We provided a driver that performs your optimizations on Patina source programs. You can invoke it with
dune exec ./patina.exe -- test.pt
It will print out the ASTs for the original program as well as the optimized program. It also makes sure that the optimized program is well-typed.