Due: Wednesday, October 10 at 4:00pm
Submission cutoff: Saturday, October 13 at 4:00pm

$$\newcommand{\lamarr}{\lambda_\rightarrow}$$

In this assignment, you will implement an interpreter for a variant of the simply typed lambda calculus or as presented in class. An interpreter is a program that implements the semantics of a programming language: it parses strings into syntax trees (or “a program”), typechecks the program, and then evaluates the program to a value. Through this assignment, you will become more familiar with the OCaml programming language as well as the process of writing interpreters for small functional languages.

Before you begin this assignment, we strongly recommend that you complete the OCaml lab to familiarize yourself with the logistics of programming in OCaml.

Setup

In the f18-assignments repository, run git pull to get the assignment materials, located in the assign2 directory.

Code overview

Take a moment to familiarize yourself with all of the starter code provided. The flow of the interpreter is:

  1. Raw source code enters as a string at the top-level main.ml .
  2. The source string is tokenized (lexed) into tokens in lexer.mll.
  3. The tokens are parsed in parser.mly into an abstract syntax tree (AST) defined in ast.mli.
  4. The AST is typechecked in typecheck.ml.
  5. The AST is interpreted to produce a final result in interpreter.ml.

Each set of .ml and .mli files represent a module, with the .mli as the interface and .ml as the implementation. For example, typecheck.mli provides the interface to the typechecker, whereas main.ml has no corresponding interface as no other functions call the main.

The basic structure of this assignment is that you will implement a complete interpreter for a “core” typed lambda calculus equivalent to the language discussed in the type systems lecture, i.e. integers and functions. Afterwards, you will then go back and add additional semantics for algebraic data types.

Parsing

We have implemented the parser for you. It maps string literals to syntax trees. In the grammar description below, we have provided both the concrete and abstract syntax for each part of the language.

For example, the string (concrete syntax):

(fn (x : int) . x * 7) 20

will map to the abstract syntax:

App(Lam("x", Int, Binop(Mul, Var("x"), Int(7))), Int(20))

Part 1: Substitution

For your first task, you will implement variable substitution on syntax trees. Specifically, you will implement the following function in ast.ml:

val substitute : string -> Expr.t -> Expr.t -> Expr.t

This is approximate translation of substitute x e' e. The substitute function replaces every usage of the free variable x in e with e'. As a reminder, here are a few examples of substitutions:

$$ \begin{align} \subst{x}{2}{(1 + x)} &= 1 + 2 \\ \subst{x}{2}{((\msf{fn} \ (x : \msf{int}) \ . \ x) \ x)} &= (\msf{fn} \ (x : \msf{int}) \ . \ x) \ 2 \\ \subst{x}{2}{((\msf{fn} \ (y : \msf{int}) \ . \ y + x) \ x)} &= (\msf{fn} \ (y : \msf{int}) \ . \ y + 2) \ 2 \end{align} $$

Part 2: Typechecker

The next step is to validate that a given expression is well-typed according to the statics semantics provided below.

$$ \ir{T-Var}{\hasType{x}{\tau} \in \Gamma }{\typeJ{\ctx}{x}{\tau}} \s \ir{T-Int}{ \ }{\hasType{n}{\tint}} \s \ir{T-Binop} {\typeJC{e_1}{\msf{int}} \s \typeJC{e_2}{\msf{int}}} {\typeJC{e_1 \oplus e_2}{\msf{int}}} \nl \ir{T-Lam} {\typeJ{\ctx,\hasType{x}{\tau_1}}{e}{\tau_2}} {\typeJC{(\funt{x}{\tau_1}{e})}{\tau_1 \rightarrow \tau_2}} \s \ir{T-App} { \typeJC{e_1}{\tau_1 \rightarrow \tau_2} \s \typeJC{e_2}{\tau_1} } {\typeJC{(\app{e_1}{e_2})}{\tau_2}} \nl $$

Your goal is to implement the typecheck function in typechecker.ml. It adheres to the following signature:

val typecheck : Expr.t -> (Type.t, string) Result.t

The typecheck function takes in an Expr.t and returns a Result.t, a sum type that is either Ok(the_type) if the expression typechecks or Error(the_error) if it does not.

Again, your implementation of typecheck should mirror exactly the semantics described by the typing rules in the language specification. You will need to carry around a typing context, a mapping from variables to types. This can be concretely implemented using a String.Map.t, documentation here.

Here’s a few examples:

typecheck(Expr.Int(3)) = Ok(Type.Int)  (* 3 : int *)
typecheck(Expr.App(Expr.Int(3), Expr.Int(3))) = Error("something informative")
typecheck(Expr.Lam("x", Type.Int, Expr.Int(3))) = Ok(Type.Fn(Type.Int, Type.Int))  (* (fn (x : int) . 3) : int -> int *)

Part 3: Interpreter

Once you have verified that a expression is well-typed, the last step is to write an interpreter in interpreter.ml that attempts to reduce the expression to a value. Your interpreter should follow the small-step operational semantics as given below.

$$ \ir{D-Int}{ \ }{\val{n}} \s \ir{D-Lam}{ \ }{\val{(\funt{x}{\tau}{e})}} \nl \ir{D-App$_1$}{\steps{e_1}{e_1'}}{\steps{\app{e_1}{e_2}}{\app{e_1'}{e_2}}} \s \ir{D-App$_2$} { \ } {\steps{(\funt{x}{\tau}{e_1}) \ e_2}{\subst{x}{e_2}{e_1}}} \nl \ir{D-Binop$_1$}{\steps{e_1}{e_1'}}{\steps{e_1 \oplus e_2}{e_1' \oplus e_2}} \s \ir{D-Binop$_2$}{\val{e_1} \s \steps{e_2}{e_2'}}{\steps{e_1 \oplus e_2}{e_1 \oplus e_2'}} \nl \ir{D-Binop$_3$}{n_3 = \binopm{n_1}{n_2}}{\steps{n_1 \oplus n_2}{n_3}} \nl $$

Specifically, you will implement trystep:

type outcome =
  | Step of Expr.t
  | Val

val trystep : Expr.t -> outcome

The trystep function attempts to take a single step on a well-typed expression t strictly following the dynamics above. If e is value then the outcome is Val. If e successfully stepped, then the outcome is Step e' where .

Here’s a few examples:

trystep(Expr.Int(3)) = Val  (* 3 val *)

let lam = Expr.Lam("x", Type.Int, Expr.Int(3)) in
trystep(Expr.App(lam, Expr.Int(2))) = Step(Expr.Int(3))  (* (fn (x : int) . 3) 2 |-> 3 *)
trystep(Expr.Binop(Expr.Add, Expr.Binop(Expr.Add, Expr.Int(1), Expr.Int(3)), Expr.Int(7)))
        = Step(Expr.Binop(Expr.Add, Expr.Int(4), Expr.Int(7)))
        (* (1 + 3) + 7 |-> 4 + 7 *)

After this part, you should be able to pass all basic tests (see Testing for more on this).

Note: in these semantics, you can assume that a number divided by 0 is 0.

Part 4: Algebraic data types

Now, we’ll extend our specification to include sum and product types (see the lecture notes for a refresher). As above, implement the given syntax, statics, and dynamics in ast.ml, typecheck.ml, and interpreter.ml.

Additional syntax

Additional statics

$$ \ir{T-Pair} {\typeJC{e_1}{\tau_1} \s \typeJC{e_2}{\tau_2}} {\typeJC{\pair{e_1}{e_2}}{\tprod{\tau_1}{\tau_2}}} \s \ir{T-Project-L} {\typeJC{e}{\tprod{\tau_1}{\tau_2}}} {\typeJC{\proj{e}{L}}{\tau_1}} \s \ir{T-Project-R} {\typeJC{e}{\tprod{\tau_1}{\tau_2}}} {\typeJC{\proj{e}{R}}{\tau_2}} \nl \ir{T-Inject-L} {\typeJC{e}{\tau_1}} {\typeJC{\inj{e}{L}{\tsum{\tau_1}{\tau_2}}}{\tau_1 + \tau_2}} \s \ir{T-Inject-L} {\typeJC{e}{\tau_2}} {\typeJC{\inj{e}{R}{\tsum{\tau_1}{\tau_2}}}{\tau_1 + \tau_2}} \nl \ir{T-Case} {\typeJC{e}{\tau_1 + \tau_2} \s \typeJ{\ctx,\hasType{x_1}{\tau_1}}{e_1}{\tau} \s \typeJ{\ctx,\hasType{x_2}{\tau_2}}{e_2}{\tau}} {\typeJC{\case{e}{x_1}{e_1}{x_2}{e_2}}{\tau}} $$

Additional dynamics

$$ \ir{D-Pair$_1$}{\steps{e_1}{e_1'}}{\steps{\pair{e_1}{e_2}}{\pair{e_1'}{e_2}}} \s \ir{D-Pair$_2$}{\val{e_1} \s \steps{e_2}{e_2'}}{\steps{\pair{e_1}{e_2}}{\pair{e_1}{e_2'}}} \s \ir{D-Pair$_3$}{\val{e_1} \s \val{e_2}}{\val{\pair{e_1}{e_2}}} \nl \ir{D-Project$_1$}{\steps{e}{e'}}{\steps{\proj{e}{d}}{\proj{e'}{d}}} \s \ir{D-Project$_2$}{\val{\pair{e_1}{e_2}}}{\steps{\pair{e_1}{e_2}.L}{e_1}} \s \ir{D-Project$_3$}{\val{\pair{e_1}{e_2}}}{\steps{\pair{e_1}{e_2}.R}{e_2}} \nl \ir{D-Inject$_1$} {\steps{e}{e'}} {\steps{\inj{e}{d}{\tau}}{\inj{e'}{d}{\tau}}} \s \ir{D-Inject$_2$}{\val{e}}{\val{\inj{e}{d}{\tau}}} \nl \ir{D-Case$_1$} {\steps{e}{e'}} {\steps {\case{e}{x_1}{e_1}{x_2}{e_2}} {\case{e'}{x_1}{e_1}{x_2}{e_2}}} \nl \ir{D-Case$_2$} {\val{e}} {\steps {\case{(\inj{e}{L}{\tau})}{x_1}{e_1}{x_2}{e_2}} {\subst{x_1}{e}{e_1}}} \nl \ir{D-Case$_3$} {\val{e}} {\steps {\case{(\inj{e}{R}{\tau})}{x_1}{e_1}{x_2}{e_2}} {\subst{x_2}{e}{e_2}}} $$

After this part, all tests (including advanced tests) should pass.

Part 5: Language extensions

This is the written portion of the assignment. Always eager to Move Fast and Break Things™, Will has submitted two new potential extensions to the language above. For each extension, he wrote down the proposed statics and dynamics. Unfortunately, only one of these extensions is type safe—the other violates at least one of progress and preservation. Your task is to identify which extension violates these theorems and provide a counterexample, then for the other extension provide a proof of both progress and preservation for the given rules.

First, recall from lecture the definitions of progress and preservation.

Progress: if , then either or there exists an such that .

Preservation: if and then .

The first extension adds let statements to the language, just like in OCaml:

$$ \ir{T-Let}{\typeJ{\ctx,\hasType{x}{\tau_1}}{e_2}{\tau_2}}{\typeJC{(\lett{x}{\tau_1}{e_1}{e_2})}{\tau_2}} \s \ir{D-Let}{\ }{\steps{\lett{x}{\tau}{e_1}{e_2}}{\subst{x}{e_1}{e_2}}} $$

For example, this would allow us to write:

The second extension adds a recursor (rec) to the language. A recursor is like a for loop (or more accurately a “fold”) that runs an expression from 0 to :

$$ \ir{T-Rec}{\typeJC{t}{\msf{int}} \s \typeJC{e_0}{\tau} \s \typeJ{\ctx,\hasType{x}{\msf{int}},\hasType{y}{\tau}}{e_1}{\tau}}{\typeJC{\rec{e_0}{e_1}{t}}{\tau}} \nl \ir{D-Rec$_1$}{\steps{t}{t'}}{\steps{\rec{e_0}{e_1}{t}}{\rec{e_0}{e_1}{t'}}} \s \ir{D-Rec$_2$}{\ }{\steps{\rec{e_0}{e_1}{\msf{Int}(0)}}{e_0}} \nl \ir{D-Rec$_3$}{n \neq 0}{\steps{\rec{e_0}{e_1}{\msf{Int}(n)}}{[x \rightarrow \msf{Int}(n), y \rightarrow \rec{e_0}{e_1}{\msf{Int}(n-1)}] \ e_1}} $$

Here’s a few examples of using a recursor:

If you know LaTeX, we strongly recommend that you use LaTeX to write your proofs. We have provided you a file tex/solution.tex in which you may write your solutions. You may, however, use a word processor or handwrite your solutions—we simply require that your submission be a PDF and legible.

Testing

To build the interpreter, run make. This will create an executable, main.native.

We’ve supplied a set of tests in the tests directory. To test an individual file, you can use main.native to invoke the interpreter, e.g.

./main.native tests/function.lam1

Binaries for Linux (Ubuntu) and Mac of our reference solution are provided at sol_linux and sol_mac and can be used similarly (i.e. sol_mac tests/function.lam1).

For each of the functions you have to implement, we have provided a few additional unit tests inside of an inline_tests function in each file. Uncomment the line that says:

let () = inline_tests ()

And then execute ./main.native (without arguments) to run those tests. If it does not fail with an assertion error, you have passed those tests.

Running run_tests.py after building will run your main.native executable over all files in tests and compare them to our solution. To run just basic tests (up through Part 3), run

python3 run_tests.py

To run both basic and advanced tests (up through Part 4), run

python3 run_tests.py adt

Note that the tests are just a representative sample of the tests we will run your code on. We highly recommend writing your own tests as well.

Submission

To submit your work, we have two Gradescope assignments for the written and programming sections. For the written section, upload your assign2.pdf file. For the programming section, run ./make_submission.sh and upload the newly created assign2.zip file.