Introduction

Chapter 1 introduces you to the Patina programming language, for which you will be writing your own compiler!

Patina is an imperative language whose syntax is very similar to the Rust programming language. We designed it so that it is big enough to let you write many interesting programs (e.g. your favorite sorting algorithm), but small enough that a compiler for Patina can be implemented in a quarter-long course.

Chapter 2 describes the language more formally, specifying the syntax, dynamic semantics, and static semantics of each language construct.

Chapter 3 walks you through setting up OCaml on your computer, which you will need for the programming assignments, described in Chapter 4. The assignments build on each other, and in the end you will have a complete, optimizing compiler that can translate any valid Patina program down to x86 assembly, which your computer will be able to execute.

Overview

This chapter gives an informal overview of the Patina language. More precise specifications can be found in the next chapter.

A Patina program consists of a list of function definitions, one of is a function called main that acts as the entry point of the program:

fn my_function(<arg_name>: <arg_type>) -> <return_type> {
  ... // body of my_function
}

... // more function definitions

fn main() -> unit {
  ... // body of main
}

Patina is an expression-oriented language. This means that

Statements (like those in Java or C) don't have a distinguished status in Patina. Instead, everything is an expression; statements are just expressions that don't produce values, and they can be used anywhere an expression can be used.
The body of each function is simply an expression (i.e. there are no explicit return statements like in Java or C). Executing a function simply means evaluating the expression that is the body of the function.

Let's see what kinds of expressions we can write in Patina.

Integer Expressions

The most basic expressions in Patina are integer constants, and they can be composed into larger expressions using binary arithmetic operators +, -, * and /. For example,

(1 + 2) / 3 * 4

is a valid Patina expression with value 4. Negative integers are written in the usual way, e.g. -2.

Integer expressions have type int.

Boolean Expressions

The most basic boolean expressions are boolean constants true and false. More complex boolean expressions can be constructed in two ways:

by comparing two integer expressions using the comparison operators: == (equal), != (not equal), < (less than), > (greater than), <= (less than or equal to) and >= (greater than or equal to),
by composing smaller boolean expressions using the boolean operators: ! (not), && (and), || (or).

For instance, the following expression evaluates to true:

(3 > 4 && (1 == 1)) || !false

Boolean expressions have type bool.

Branching

Patina supports if-then-else expressions, which evaluates to the then-expression when the condition expression evaluates to true or the else-expression otherwise. For example, the expression

if 0 <= 1 then {
  2 + 3
} else {
  3 * 4
}

evaluates to 5.¹

Note that it is possible to simulate "else if" clauses by nesting multiple if-then-else. For example,

if 0 == 1 then {
  3
} else {
  if 0 == 2 then {
    5
  } else {
    7
  }
}

evaluates to 7.

You may have noticed the extra pair of curly braces that enclose the then and the else branch. They are not just there for readability; in fact, { ... } is another kind of expression in Patina, called sequence expressions. You will learn more about them in the next section.

For those of you familiar with C++ (or Java), Patina's if-then-else expressions correspond to C++'s conditional expressions. For example, the if-then-else expression here would translate to 0<=1 ? (2+3) : (3*4) in C++. Importantly, Patina's if-then-else expressions are different from C++'s if-statements. Statements don't produce values, but expressions do.

Unit expressions

Patina expressions don't always need to evaluate to an int or bool value. For some expressions, their only purpose is to have some side effect upon the execution of the program. Printing something to standard output is among one of those scenarios. In Patina, evaluating the expression

print_int(8)

has the effect of printing the integer 8 to the standard output, but the expression itself doesn't evaluate to any integer value (i.e., it is meaningless to write print_int(8) + 2).

In Patina, expressions like this have a special type called unit. An expression of unit type evaluates to a special value called the unit value, written as (). Note that the unit value () is the only value that inhabits type unit. Compare it with type int, which harbors many values, such as 0, -3, 42, 1024, ...

Sequencing

It is often useful to string together multiple unit expressions with side effects, and evaluate them one by one. Say we would like to print two integers. In Patina, we can express this using a sequence expression:

{
  print_int(8);
  print_int(9)
}

This should be understood as a single overall expression e written as { e1 ; e2 }. The expression e contains two sub-expressions, namely, print_int(8) as e1 and print_int(9) as e2.

To evaluate a sequence expression { e1; e2; ...; en }, we evaluate each ei sequentially, discarding the values of all expressions except for the last expression en. The value of en will become the value of the overall sequence expression.

So in the previous example, the value of the overall sequence expression is the value of print_int(9), which is (), the unit value. The value of print_int(8) is suppressed.¹

A sequence has to have at least one expression, but it may very well have just a single expression, as you have encountered in a previous if-then-else expression:

if 0 <= 1 then {
  2 + 3
} else {
  3 * 4
}

Here, the then branch contains the sequence { 2 + 3 } consisting of a single expression 2 + 3 (and similarly with the else branch).

Sequences also appear as while expressions's body and functions' body, both of which we will introduce shortly.

Variable Declaration

Sequence expressions not only are useful for specifying sequential computation with side effects, but they also delineate how long variables declared inside them should live. Let's first see how variables are declared in Patina.

Variable declaration is another kind of side effect in addition to printing to standard output. In Patina, variables are created using let-expressions:

let x: T = e

This expression will create a new variable x of type T, which is initialized with the value of expression e. Then, x becomes available ("in-scope") in subsequent expressions within the same sequence.

For example,

{
  let x: int = 5;
  let y: int = x + 1;
  print_int(x * y)
}

will print out 30, but

{
  {
    let x: int = 5;
    let y: int = x + 1
  };
  print_int(x * y)
}

will result in a compile-time error due to the last line: variables x and y are only available inside the inner sequence, and are not available in the outer sequence.

Let-expressions themselves have type unit.

Variable Assignments

In addition to reading the value of an variable, you might also want to mutate it. Patina has assignment expressions for this purpose. For example,

{
  let x: int = -1;
  x = x + 2;
  print_int(x)
}

initializes a new integer variable x with value -1, increments the value of x by 2, and prints 1 to the screen.

While Expressions

Patina's while expressions are very similar to the while loops in other imperative languages like C or Java. A while expression has a condition and a body sequence:

while cond {
  e1;
  e2;
  ..;
  en
}

To evaluate a while expression, we first evaluate its condition:

If it is true, then we evaluate the body sequence repeatedly until the condition evaluates to false.
Otherwise, the body sequence will not be evaluated.

For example,

{
  let i: int = 0;
  while i < 3 {
    i = i + 1
  };
  print_int(i)
}

will print 3 to the standard output.

The condition should have type bool, and the body have type unit. The overall while expression will have type unit.

For simplicity, Patina has no break or continue constructs.

For those of you familiar with Java or C, the semicolon has a different meaning here. In Java or C, a semicolon signals the end of a simple statement. In Patina, a semicolon is part of the syntax for sequence expressions, so they must appear together with a surrounding pair of curly braces. Moreover, a semicolon is always sandwiched between two expressions, and it means that "we don't care the value of the first expression, but please evaluate it before you evaluate the expression after the semicolon."

Arrays

Patina supports dynamically allocated, zero-indexed arrays of integers.

Array Allocation

We can create an array by calling the built-in function alloc(..) with a desired length. All elements will be initialized to 0. For example,

{
  let len: int = 3;
  let xs: [int] = alloc(len)
}

creates an array of length 3 and binds it to variable xs.

Reading from an Array

Array elements can be accessed using array-access expressions, written as

xs[i]

where xs is an array, and i is an integer expression.

Writing to an Array Element

When an array access appears on the left-hand side of an assignment, the selected array element can be mutated, just like how the value of a variable is mutated. For example,

{
  let singleton: [int] = alloc(1);
  singleton[0] = 100
}

changes the 0-th element of singleton from the default value of 0 to 100.

Aliasing

Aliasing is permitted in Patina. Two array variables can simultaneously refer to the same array object:

{
  let xs: [int] = alloc(1);
  let ys: [int] = xs; // ys refers to the same array as xs
  ...
}

Arrays are passed by reference. So [int] is like a pointer to integer (int *) in C++. In the following example, both reverse_1 and reverse_2 reverses the input array. However, reverse_1 modifies the input array in place, while reverse_2 returns a new array that's the reverse of the input array.

fn reverse_1(xs: [int], len: int) -> unit {
  let mid: int = len / 2;
  let i: int = 0;
  while i < mid {
    xs[i] = xs[len - i - 1];
    i = i + 1
  }
}

fn reverse_2(xs: [int], len: int) -> [int] {
  let ys: [int] = alloc(len);
  let i: int = 0;
  while i < len {
    ys[i] = xs[len - i - 1];
    i = i + 1
  };
  ys
}

Extent

Because you're not required to implement a garbage collector for your compiler, arrays will have infinite extent. In other words, an array cannot be destroyed once it has been allocated.

In the following example, even after the function f has returned, the array referred to by xs inside the body of f continues to exist in heap memory. But we can't access the array again, because its only name xs is no longer available outside the function body:

fn f() -> unit {
  let xs: [int] = alloc(1); // xs refers to an array in heap memory
  ()
}

fn main() -> unit {
  f();
  ... // can't refer to that array anymore, since xs is out-of-scope
}

Once we've created an array, unless we always remember its name (or one of its names), the array would become an unnameable identity forever lost in heap memory. This is what's called a memory leak, which plagues programs written in unmanaged languages like C and C++. You will learn about solutions to this problem in the second half of the course, although you won't be required to implement them ¹.

Also, learn how Rust prevents resource leaks using a completely different approach. It's super cool!

Functions

In Patina, functions are the only creatures that are not expressions. They are declared at the top level, along with the mandatory main function.

By convention, main doesn't take in any argument, and always has return type unit:

fn main() -> unit

Other functions can have any return type (unit, bool, int, or [int]), and their arguments also need to be annotated with types. The body of each function is a sequence expression whose type must match the declared return type.

Recursion

Functions can be recursive. This program will print out 55, the tenth fibonacci number (indexing from zero).

fn fibonacci(x: int) -> int {
  if x == 0 || x == 1 then {
    x
  } else {
    fibonacci(x-1) + fibonacci(x-2)
  }
}

fn main() -> unit {
  print_int(fibonacci(10))
}

You can also define mutually recursive functions:

fn even(x: int) -> bool {
  x == 0 || odd(x - 1)
}

fn odd(x: int) -> bool {
  x == 1 || even(x - 1)
}

fn main() -> unit {
  print_bool(odd(3))
}

This program will print out true.

Built-in Functions

Patina also provides the following built-in functions, which can be called freely in Patina programs:

// print a boolean to stdout
fn print_bool(x: bool) -> unit

// print an integer to stdout
fn print_int(x: int) -> unit

// print an array of integers to stdout
fn print_arr(xs: [int], length: int) -> unit
 
// print the newline character '\n' to stdout
fn print_ln() -> unit

// allocate a new, zero-initialized array of specified length
fn alloc(length: int) -> [int]

For example, the following program

fn main() -> unit {
  print_bool(!true);
  print_ln();

  print_int(0);
  print_int(1);
  print_ln();
  
  let xs: [int] = alloc(3);
  xs[0] = 3;
  xs[1] = 2;
  xs[2] = 1;
  print_arr(xs, 3);
  print_ln()
}

will print out

false
01
3 2 1

Reference Manual

Concrete Syntax

Program   ::= Function+
Function+ ::= a non-empty list of "Function"
Function  ::= fn ID ( Param* ) -> Type Sequence
Param*    ::= a list of "ID : Type" separated by ","
Type      ::= unit | bool | int | [int]
Sequence  ::= { Expr+ }
Expr+     ::= a non-empty list of "Expr" separated by ";"
Expr      ::= ()
           | true | false 
           | NON_NEG_INT
           | ID
           | ( Expr )
           | - Expr
           | ! Expr
           | Expr ⊕ Expr      ⊕ ∈ { +, -, *, /, &&, ||, ==, !=, >=, >, <=, < }
           | Sequence
           | if Expr then Sequence else Sequence
           | let ID : Type = Expr
           | while Expr Sequence
           | ID [ Expr ]
           | ID = Expr
           | ID [ Expr ] = Expr
           | ID ( Expr* )
Expr*     ::= a list of "Expr" separated by ","

ID          = a letter or underscore, followed by any number of letters, numbers, or underscores

Precedence and Associativity

Unary operators have the highest precedence, followed by the arithmetic operators, integer relations, and logical operators, with their usual precedence. That is, multiplication and division have higher precedence than addition and subtraction, and logical conjunction has higher precedence over disjunction.

Operators associate to the left. Relations are non-associative.

Typing Rules

Changelog

None

Please view the typing rules here.

Setting up OCaml

This guide covers how to set up OCaml on your computer.

Installing opam walks you through how to install opam, the package manager for OCaml. You can think of this as a gateway to all official and third-party OCaml packages. You will need to go through opam to install anything OCaml-related.
Once you have opam, Installing OCaml walks you through setting up the core tools necessary to run any OCaml program.
Interacting with OCaml code introduces common ways of interacting with OCaml code.

Installing `opam`

Adapted from this guide written by Bryan Tan.

opam is a package manager for OCaml. You will later use it to install OCaml compilers, interpreters, and other third-party libraries needed for this course.

Installing on CSIL Machines

opam is a little trickier to set up on CSIL. If you don't have an account yet, create one using this link.

Note: If you previously had an unsuccessful installation of opam, please run rm -rf ~/.opam to clean up partial installation before retrying the steps below.

Log onto a CSIL machine (you can do it remotely). Once you're logged in, fire up a terminal if you don't have one open, and type the following command to download the opam binary:

curl -LR 'https://github.com/ocaml/opam/releases/download/2.1.0/opam-2.1.0-x86_64-linux' -o opam

Then make sure the downloaded binary is executable, and move it to ~/bin/opam:

chmod +x opam
mkdir -p ~/bin/
mv opam ~/bin/opam

Check to make sure it's on PATH:

[junrui@csil-08 ~]$ opam --version
2.1.0

Lastly, run opam init. This will take 20-40 minutes because it has to download a bunch of stuff. It will prompt you once or twice afterwards; you can safely respond with Y to make your life more convenient.

Installing for Non-CSIL Linux

Follow the steps here, under the "Binary distribution" section header.

Installing for macOS

If you have homebrew, just run

brew install gpatch
brew install opam

Otherwise, follow the same steps in "Installing for Linux" section.

Installing for Windows

Windows users are recommended to use CSIL instead, since OCaml doesn't work too well on Windows.

Installing OCaml

Once you have opam, you can create a "switch" so that anything you installed for this course won't contaminate the global environment.

Inside your terminal, run

opam switch create cs160 ocaml-base-compiler.4.13.0
eval $(opam env)

This will create a new switch called cs160. It will also compile the tools necessary to run OCaml programs, including a compiler (ocamlc) and an interpreter (ocaml).

Once the switch is created, you can play with the interpreter by running ocaml in your terminal.

But we recommend using utop as an alternative. To install it, run

opam install utop

opam will install any additional dependencies required by utop.

Interacting with OCaml Code

Interacting without Source Files

Simply run utop on your terminal. You can enter OCaml expressions and let bindings.

Running an OCaml Program

Open your favorite code editor, and create a file called hello.ml that contains the following lines:

let s = "hello" 
let _ = print_endline s

Run utop hello.ml to let utop interpret your program from start to finish. You will see "hello" printed on your terminal.

Alternatively, you can run utop with no arguments, and load your program dynamically with

#use "hello.ml";;

You will also see "hello" printed on your screen, but this time utop won't exit. So you can continue to use the interpreter, with everything defined in hello.ml now available for use. For instance, the variable s containing the string "hello" is still available, so you can continue to do

print_endline s;;

Building an OCaml Project with Multiple Files

You will need a build system, like dune, to manage OCaml programs that spread over more than one file. But for the course assignments we will set up everything for you. So no need to worry about this scenario!

Programming Assignments

CS160 Assignment 1: OCaml

Assignment due: Monday, October 18 11:59PM

Changelog

Problem 3:
- For Let (x, e), further Let-bindings created inside e will not be inherited.
Problem 4:
- For While's condition, treat the zero value as false, and non-zero values as true.
Problem 2:
- You don't need to worry about division by zero.
- Div is to be interpreted as integer division.

Introduction

In this assignment, you will get some practice with recursion and pattern matching by building a sequence of interpreters for increasingly large subsets of the Patina programming language.

There are 3 main parts to the assignment (Problems 1-3), as well as a bonus problem (Problem 4). For each problem, we'll provide a file containing some starter code. They can be found in the CS160 GitHub repo:

assoc.ml (Problem 1)
patina_arith.ml (Problem 2)
patina_let.ml (Problem 3)
patina_ref.ml (Problem 4)

You will fill in the missing parts of the code that are marked with (* your code here *).

If you're stuck for more than 15 minutes on a problem, try to ask a general question on slack, or if you can't, direct message a TA.

Respect academic integrity; no sharing solutions, code or otherwise. You may use the internet, but not for finding solutions to these particular problems.

Submission and Grading

Please submit the following files to Gradescope:

assoc.ml
patina_arith.ml
patina_let.ml
patina_ref.ml

You are free to leave the assertions in your files uncommented, as long as they don't break your code.

Each test case is worth 1 point.

The test cases for the bonus problem will be available soon.

Problem 1 [★]

Goal of this exercise: Practice tuples, options, and recursion over lists.

In this problem, you will implement the association list data structure, which you may need for a later problem.

Association lists are the simplest kind of lookup tables. In OCaml, they are ('k * 'v) list that map keys of type 'k to values of type 'v. To insert a key-value pair into an association list, we simply insert the key-value pair to the beginning of the list:

let insert (k: 'k) (v: 'v) (al: ('k * 'v) list) : ('k * 'v) list =
  (k,v) :: al

That is, insert takes a key and a value, as well as a list of keys and values, and inserts the key-value pair to the beginning of the list.

Note that the key's and value's types are not concrete, because the insert function is designed to work over any two key value types (an example of polymorphism). However, the list that the key-value pair is being inserted into must be a list of the correct type.

You will implement the complementary lookup_opt function. It looks like

let rec lookup_opt (k: 'k) (al: ('k * 'v) list) : 'v option =
  match al with
  | _ -> failwith "Not yet implemented" (* your code here *)

To query the value associated with a key k in the list, we find the frontmost pair whose key matches k, and we return the associated value. However, our query k might not be in our list. That's why the return type is 'v option, instead of 'v.

In the example below, we insert

("x", 1),
("y", 2),
("x", 3)

to the list al sequentially. Because ("x", 3) is more recent than ("x", 1), the lookup lookup_opt "x" al returns the Some 3, instead of Some 1.

let al = insert "x" 3 (insert "y" 2 (insert "x" 1 []))
(* al is now [("x", 3), ("y", 2), ("x", 1)] *)

let _ = assert (lookup_opt "z" al = None)
let _ = assert (lookup_opt "y" al = Some 2)
let _ = assert (lookup_opt "x" al = Some 3)

Provide a recursive implementation for lookup_opt. Your code may not call any OCaml library functions.

Hints:
On option types. A value of an 'a option type can either be None, or Some of some value of type 'a. Here is an example demonstrating the option type. If we want to divide integers without risking an exception, one solution is to use option types.
let safe_divide (x: int) (y: int) : int option = 
  if y = 0 then None else Some (x / y)
On recursion. Recursion is simpler to use than loops in OCaml, and often better. You need to use let rec rather than let to declare a recursive function. Other than that, a function can call itself just as any other function. Here is a recursive function that computes the Hailstone sequence:
let rec hailstone n = 
  if n = 1 then
    print_string "1 reached"
  else if n mod 2 = 1 then
    hailstone (3*n+1)
  else
    hailstone (n / 2)
On pattern-matching. As seen in the course slides, the primary way that you interact with values is by case analysis. The syntax for case analysis is via match .. with .. which maps each possible case to the desired expression. All values that are part of the case need to be assigned to variables if you want to use them in the expression. For instance, the following function uses pattern matching to sum the first two elements of a list:
let sum_of_first_2_elements (l: int list) : int =
  match l with
  | []          -> 0
  | x :: []     -> x
  | x :: y :: _ -> x + y

Problem 2 [★★]

Goal of this exercise: Practice pattern matching and functions as first-class objects.

Patina\(^{arith}\) is a subset of Patina that only contains integer constants and binary expressions. Expressions will be represented in OCaml as abstract syntax trees as follows:

type binop = Add | Sub | Mul | Div
type expr = Const of int
          | Binary of binop * expr * expr

Below are two Patina\(^{arith}\) expressions represented as expr:

e1 represents the concrete expression "0 + 2 * 3"
e2 represents the concrete expression "3 * 4 - 30 / 6"

let e1 =
  Binary (
    Add,
    Const 0,
    Binary (Mul, Const 2, Const 3))

let e2 =
  Binary (
    Sub, 
    Binary (Mul, Const 3, Const 4),
    Binary (Div, Const 30, Const 6))

We provide you with a partially complete interpreter for Patina\(^{arith}\). The main interpreter function is interpret, which evaluates a Patina\(^{arith}\) expression to an OCaml integer.

let rec interpret (e: expr) : int =
  match e with
  | Const n -> n
  | Binary (op, e1, e2) -> 
    (interpret_op op) (interpret e1) (interpret e2)

and interpret_op (op: binop) : int -> int -> int =
  match op with
  | _ -> failwith "Not yet implemented" (* your code here *)

Here, interpret calls a helper function, interpret_op, which evaluates binary integer operators (binop) to their interpretations as OCaml functions (int -> int -> int). The binary operators have the usual interpretations, and Div should be interpreted as integer division. For instance, interpret e1 should return 6, and interpret e2 should return 7.

Complete the definition for interpret_op. You don't need to handle division by zero, since the test cases won't contain input expressions that cause division by zero errors.

Hint: You are likely to find anonymous functions helpful here.

As an example, the following variable three_anonymous is a tuple of 3 anonymous functions so you can see how they're created:
let three_anonymous =
  ( (fun x -> if x then 3 else 6),
    (fun x -> fun y -> x ^ y),
    (fun (a,b,c) -> if a > b then a-b else c) )
The first takes a bool to an int, the second concatenates two strings, and the third compares integers. You will notice that the second is an anonymous function returning an anonymous function, whereas the third takes all its arguments as a triple. The former format is called "curried" and the latter "uncurried". Be careful to use the right one.

Problem 3 [★★★★]

Goal of this exercise: Practice pattern matching and simulating "state changes" in a functional way.

Now let us extend Patina\(^{arith}\) into Patina\(^{let}\), which additionally supports variable bindings and sequences:

type binop = Add | Sub | Mul | Div
type expr = Const of int
          | Binary of binop * expr * expr
          | Id of string             (* new *)
          | Let of string * expr     (* new *)
          | Seq of expr list         (* new *)

A Let expression is a string * expr pair, where the value of the expr will be bound to the string name. The name will available in subsequent expressions in the parent Seq expression. Any binding created in the expr will not be inherited. For example, the following expression

Seq [
  Let ("x", 0);
  (* Current environment: x |-> 0 *)
  
  Let ("y", Let ("x", 1));
  (* Current environment: y |-> 0, x |-> 0 *)
  x
]

evaluates to 0, because the second binding of y is created in the right-hand side of binding of x, hence is ignored.

The evaluation rules for the newly added constructs are as follows:

A Let expression evaluates to 0.
A Seq expression evaluates to the value of the last expression in the sequence.
An Id expression evaluates the value associated with the identifier string.

In the example below,

e1 represents the concrete Patina expression { let x: int = 2; x * 3 },
e2 represents the concrete Patina expression { let x: int = -1; { let x: int = 2; x }; x }

let e1 = 
  Seq [
    Let ("x", Const 2);
    Binary (Mul, Id "x", Const 3)
  ]

let e2 =
  Seq [
    Let ("x", Const (-1));
    Seq [ Let ("x", Const 2); Id "x" ];
    Id "x"
  ]

let _ = assert (interpret e1 = 6)
let _ = assert (interpret e2 = (-1))

e1 evaluates to 6, because the let-expression binds "x" to Const 2, and the binary expression Binary (Mul, Id "x", Const 3) evaluates to 6 when Id "x" is replaced with Const 2.

e2 evaluates to -1, because the last "x" refers to the variable bound in the outer sequence.

Extend your interpreter in Problem 2 to support the new language constructs. Your interpret will have the following type:

let rec interpret (e: expr) : int = 
  failwith "Not yet implemented" (* your code here *)

You may assume that the test cases won't contain semantic or runtime errors, such as reference to unbound variables, or empty sequences.

Hints:
You may want to use some kind of environment (recall what you did in Problem 1) to keep track of the values of variables that are in scope:
type environment = (string * int) list
interpret itself can be non-recursive, and can call another helper function that actually evaluates the expressions recursively. The helper function may look like:
let rec interpret' (e: expr) (env: environment) : (environment * int) =
   match e with
   ...
Compared to interpret, the function interpret' additionally takes in an environment, and additionally returns an environment augmented by Let expressions ¹.
Be very careful how Seq and your environment interact. If you are unsure, the Variable Declaration section in the Patina language overview contains more examples and explanations.

(Bonus) Problem 4 [★★★]

Extend your interpreter to support Patina\(^{ref}\), which further supports assignments and while loops:

type binop = Add | Sub | Mul | Div
type expr = Const of int
          | Binary of binop * expr * expr
          | Id of string
          | Let of string * expr
          | Seq of expr list
          | Assign of string * expr    (* new *)
          | While of expr * expr       (* new *)

Both Assign and While evaluate to 0. A While expression is an expr * expr pair, where the first item of the pair is the condition, and the second item is the loop body.

Because we don't model booleans here (for simplicity), simply treat the zero value as false, and non-zero values as true.

For more information about the semantics of assignment and while expressions, refer to the relevant parts in the section on Unit Expressions in the Patina language overview.

Hint: You may want to use ref cells for your environment:
type environment = (string * int ref) list

This helper function is an example of what's called a state monad, which is a functional way of simulating "state changes" without using mutable variables. (The linked article is for people learning a different functional language called Haskell, but the syntax involved in that particular discussion is quite similar to OCaml's syntax.)

CS160 Assignment 2: Lexing and Parsing

Assignment due: Friday, October 29 11:59PM

Changelog

The regular expression for ID should match underscores.
Due date has been extended to Friday.

Introduction

In this assignment, you will first write some toy programs in the Patina language to get more familiar with the language's syntax. Then, you will build a lexer and a parser to parse well-formed Patina programs into abstract syntax trees.

Instructions

Download the starter code here.
Unlike Assignment 1, the starter code is no longer individual files, but a self-contained dune project. You will need to install dune through opam:
```
 opam install dune
```
There is guide on how to install opam on your machine or CSIL. If you encounter any difficulty installing opam or dune, please let us know on Slack.
The only files you need to modify are scanner.mll and parser.mly.
A driver program (patina.ml) is provided to help you test your lexer/parser. It requires ppx_deriving.show for pretty printing, which you can install with opam install ppx_deriving. Once you have that, you can run the driver on a Patina source file with
```
 dune exec ./patina.exe -- <filename>
```

Submission and Grading

Submit your scanner.mll and parser.mly to gradescope.

Please make sure that your code compiles before submitting them.

There are 21 tests in total, 16 of which are public. You can find the public tests here.

Part 1

You will write some simple programs in the Patina language. We'll provide you with a prototype interpreter for you to validate your programs. Be aware that the prototype is quite crappy; by the end of this course will have a compiler that's much better than ours. Use patina-darwin if your machine uses macOS, and patina-linux if your machine uses Linux (including CSIL). Run it with ./patina-<your-os> -i <patina-source-file>.

For each of these, we recommend looking at the syntax reference of Patina, as well as the overview of Patina in the first section of the page. They should contain all the information that is necessary to complete the assignments. For anything that's unclear after that, ask on Slack!

Program 1

Write a simple function to practice the oddities of Patina syntax. This function should take an int as argument, create an array with length equal to the integer, fill it with 7s, and then return it. The main function should call this function with 3 as its input and print out the 2nd value in the result.

Program 2

Let's follow up on the section - write a function that takes in an integer and returns whether or not the integer is a perfect number. You can do so by creating a list of all the factors of the input integer, and then sums them together to check whether a number is perfect or not, returning true if it is and false otherwise.

In your main function, use a while loop to print out perfect numbers up to 1000, each on a new line. For example, if your source file is named perfect.pt, then running ./patina-<your-os> -i perfect.pt should output

6
28
496

Part 2

The first step of a compiler is splitting the string of the whole code into pieces which represent all of the important parts. Since Patina has no syntax for comments, this should be straightforward.

Ocaml has the very useful tool for doing lexing of ocamllex, which takes a definition of a mapping from token strings to tokens, and produces an automaton that can actually do the lexing. This takes advantage of the insight you've learned in class about how simple enough languages can be parsed with simple automata.

Each of the lines of code will look something like:

  | '/'  { DIV }

Each of these tokens is atomic for the purpose of parsing - it can't be divided further without losing some utility. For example, you don't want to make each digit in an integer a separate token, because you'd have to put them all together when parsing any case where they're used. On the other hand, you wouldn't want to count "(2" as a token, because that parenthesis is going to work just as any other parenthesis in the problem.

Fill out the rows of your template scanner.mll file to write a lexer for Patina's full syntax.

Part 3: Parser

Parsing is the central work of this assignment, and the purpose of good lexing is to make parsing easier.

ocamlyacc is a tool which automatically generates a parser for a language if you define how it works. You will use it to parse textual file containing Patina programs into their abstract syntax tree representation.

Each line of the parser should be inside of a block defining a non-terminal or terminal symbol, and should include both the symbols that make it up on the left, as well as OCaml code (using the constructors of the AST) which represents the value on the right. This value is not the result of interpretation of the code, so no addition or other calculation should be done.

For example, to say that an expression formed by two sub-expressions separated by the DIV token, we write

expr:
    ...
    | expr DIV expr      { Binary(Div, $1,$3) }

We have defined an abstract syntax tree (AST) type for you in ast.ml, but it's up to you do define the intermediate nodes of parsing. You can use the patina concrete reference not only as a reference of how Patina code looks, but also as hints for what parsing structure to use.

Modify the template parser.mly file to write a parser for Patina's full syntax. Specifically, you will need to add three kinds of stuff:

Since the parser is connected to the lexer your wrote for Part 2, you need to declare in the parser file the tokens you defined in the lexer:
```
%token // TODO: Your tokens here
```
Define your grammar rules:
```
// TODO: Your rules here E.g.
```
Your grammar will likely be ambiguous (i.e. there will be shift/reduce or even reduce/reduce conflicts). You can resolve a lot of ambiguities by simply specifying the precedence and associativity of the elements in your grammar:
```
// TODO: Your associativity rules here
```

CS160 Assignment 3: Type Checking

Assignment due: Monday, November 15 11:59PM

In this assignment, you will implement a type checker for your Patina compiler.

Instructions

Download the starter code here.
Make sure you have installed dune and ppx_deriving via opam, as required in AS2. You don't need to install anything new for this assignment.
Replace scanner.mll and parser.mly with your own implementation from AS2.
Complete the type checker in typecheck.ml (described in greater details later).
As in the previous assignment, we provide you with a driver program (patina.ml) that takes in Patina source files and invokes your parser and type checker on them. Run the driver program with
```
dune exec ./patina.exe -- <patina-source-file> 
```

Submission and Grading

We will update this section once the autograder is ready.

What You Need to Implement

If you are feeling adventurous, you can start with an empty typecheck.ml file and build your type checker from scratch. The only requirement is that the file should export a function called check_prog with the following signature:
val check_prog : Ast.prog -> unit
This function takes in an Ast.prog. It returns unit if there's no type error, and raises Error.TypeError otherwise.

You will implement the typing rules described in the reference manual, as well as a few more semantic checks.

The typecheck.ml file contains a skeleton of the type checking algorithm. On a high level, it is quite similar to the interpret' function you wrote in AS1:

The main type checking function, check, takes in an expression and some environment, and outputs a pair consisting of a new environment and some result.
Previously, the result type was int, because your interpreter was evaluating the input expression to an integer. Now the result type is Ast.typ (which can be either TUnit, TBool, TInt, or TArr), because the type checker computes the type of the input expression (if the expression is well-typed).
Previously, the environment type was (string * int) list, which maps identifiers to their integer values¹. Now the environment is called tenv, which stands for typing environment. This is the \(\Gamma\) you have encountered in the typing rules, which maps identifiers to their declared types. The check function also takes in a function environment, which is the \(\Delta\) in the typing rules.

The structure of check is also quite similar to interpret'. It traverses the AST in a post-order/bottom-up manner, by pattern-matching on the input expression, recursively type-checking the sub-expressions (if any), and aggregating the types of the sub-expressions. You can see a concrete example in the partially written cases we've provided. For example, in the case of unary Not expressions,

| Unary (Not, e) ->
    let te = type_of e in
    expect e TBool te;
    return TBool

we first recursively type check the sub-expressions e. Then we assert that e have type TBool. Finally, we return the type of the overall unary expression, which is also TBool.

However, it is not always the case that an expression is well-typed. What should check return in those cases? Clearly, we can't return a type as we normally would, because that would mean that the ill-typed expression has a valid type. Instead, the way we handle ill-typed expressions in this assignment is through exceptions. You have probably encountered exceptions in other languages. In OCaml, to raise an exception exn, we simply say raise exn. To handle exception exn that might arise during the evaluation of an expression e, we write

try
  e
with
| exn -> (* exception handling code *)

If your type checker encounters a type error, it should raise a TypeError, which is defined in error.ml. However, you don't need to raise exceptions explicitly using raise (and we recommend against doing that). Instead, use the helper functions error.ml, who will raise the appropriate exceptions on your behalf.

Fill in the missing parts that are marked with hole () in typecheck.ml.

In addition to the typing rules described in the reference manual, your type checker should also perform a few other semantic checks. For example, a program should only have one main function, which takes no argument and returns a unit. You can get some hints as to what kinds of semantic checks need to be performed, by looking at the helper functions in error.ml.

You can think of the environment in AS1 as a lazy substitution table.

CS160 Assignment 4: Code Generation

Assignment due: Monday, November 29 11:59PM

Your compiler will generate x86 assembly code for Patina programs.

Submission and Grading

TBF

Introduction

For this assignment, you will be tackling the most central part of a compiler: the generation of code of a lower level language. In this case, we will be using x86 as the target language. See the Resources section

This would normally be far more difficult than each of the other individual stages of the compiler. However, we have introduced a number of simplifications to make it doable.

Function calls are disallowed, and any functions besides main. In essence, the patina files to be compiled will represent a simple script. (Functions will be included in bonus part #1.)
Arrays are no longer included in the language specification. This allows all data to be stored on the stack as supposed to on the heap. (Arrays will be included in bonus part #2.)
This matters for performance but not to correctness, no register allocation algorithm needs to be written. All data will be stored inside a single frame of the stack.

Overview of approach

However, this still leaves a fair amount of complexity. For example, the following function calculating fibonacci numbers is still allowed.

fn main() -> unit {
	let fibn : int = 10;
	let firstval : int = 1;
	let secondval : int = 1;
	if (fibn < 0) then {
		let res : int = 0/0
	} else {
   	while (fibn > 0){
   		let sum : int = firstval + secondval;
   		firstval = secondval;
   		secondval = sum;
   		fibn = fibn - 1
	};
	let res : int = secondval
   }
}

As you can see, this still contains the challenges for implementation of

jumps required for while loops and if expressions
binop expressions which when compiled require multiple lines of assembly.

For the former, you will be adding labels into the assembly to allow appropriate jumps. For example, the expression if false then {2} else {1} + 1 should be translated into something equivalent to the following (ignore the binop compilation there for now).

cmp $1, $0
je L1
mov $1, %rax
jmp L2
L1:
mov $2, %rax
L2:
add $1, %rax

For the latter, you will be reading variables from the stack into some fixed registers, pushing them onto the stack temporarily to allow for arbitrary depth arithmetic trees, and then writing them back there. For example, the expression x = x * 4 should be translated into something equivalent to the following (which is inefficient, but you only need correctness for now.)

mov 16(%rbp), %rax
push %rax
mov $4, %rax
mov %rax, %rbx
pop %rax,
add %rbx, %rax
mov %rax, 16(%rbp)

What you need to implement

In each of these cases, you can test your code (download the files from the github repository) by running

./patinac {yourfile.pt}; ./prog.

The easiest way to test is likely to be testing purely on expressions. In order to do this you must designate expr as a start symbol in your parser, just as prog is already.

Inside patinac, you can delete or keep the -e flag to determine whether the file is being parsed as a program with a single function in it or as just an expression, respectively.

Mandatory Part - Loops, Conditionals, and Variables

Fill in the holes in the provided code, so that the compile function correctly converts an AST from the following subset of Patina to x86 assembly.

type expr =
   | Const of const
   | Id of string
   | Let of string * typ * expr
   | Assign of string * expr
   | Unary of unop * expr
   | Binary of binop * expr * expr
   | Ite of expr * expr * expr
   | Seq of expr list
   | While of expr * expr

Bonus Part 1 - Functions

TBF

Bonus Part 2 - Arrays

TBF

Resources

x86 Assembly

x86 Calling Convention (`cdecl`)

Wikipedia page on x86 calling conventions

CS160 Assignment 5: Optimizations

Assignment due: Friday, December 10th 11:59PM

Submission and Grading

TBF

What you need to implement

In this assignment, you will implement the optimization phase for your compiler. Optimizations are commonly performed on intermediate representations. However, for simplicity you will directly optimize on the following subset of Patina AST, which excludes variable mutation, arrays, and while loops:

type expr =
  | Const of const
  | Id of string
  | Let of string * typ * expr
  | Unary of unop * expr
  | Binary of binop * expr * expr
  | Ite of expr * expr * expr
  | Seq of expr list
  | Call of string * expr list

On minimum, you should implement constant folding and propagation:

Constant folding partially evaluates an expression as much as it can. For example, the nested binary expression 100 * 100 * 100 will be replaced with a single constant, 1000000.
Constant propagation replaces references to constant variables with their actual values. For instance, the sequence { let x: int = 1; print_int(x) } will be transformed into { let x: int = 1; print_int(1) }.

Fill in the body of function fold_constant in optimize.ml. As always, you will likely need some kind of environment (to do constant propagation). Think about what kind of environment might be good. Then, replace bottom with a type of your choice on the following line in optimize.ml:

type env = (string * bottom) list

and perform fold_constant using this environment.

We provided a driver that performs your optimizations on Patina source programs. You can invoke it with

dune exec ./patina.exe -- test.pt

It will print out the ASTs for the original program as well as the optimized program. It also makes sure that the optimized program is well-typed.

The Patina Programming Language