Due: Wednesday, October 25, 2017 at 4:20pm
Submission deadline: Saturday, October 28, 2017 at 4:20pm

Now that you’ve got some interpeter experience under your belt, it’s time to take your language to the next level. In this assignment, you will implement an interpreter for the exciting simply typed lambda calculus, or $\lamarr$ (a name of my own invention). Specifically you will extend our previous language $\lambda_\rightarrow$ with algebraic data types, pattern matching and polymorphic/existential types.

Setup

Setup is same as always—copy the assignment folder out of our folder on Rice.

On the Rice machines:

cp -r /afs/ir/class/cs242/assignments/assign4 assign4


scp -r <SUNetID>@rice.stanford.edu:/afs/ir/class/cs242/assignments/assign4 assign4


Requirements

You must implement the translation, typechecking, and interpretation stages of the interpreter. You wll turn in these files:

• translator.ml
• typechecker.ml
• interpreter.ml

There are no written portions to this assignment.

Submitting

Then upload the files in your copy of assign4 to Rice. To submit, navigate to the assignment root and execute:

python /afs/ir/class/cs242/bin/submit.py 4


Note: We are grading tests that do not result in an error using exact text matching, and will not fix submissions that have extra print statements or the like. It is your responsibility to test before submitting

Language specification

Adding algebraic datatypes, pattern matching, and first-order type logic to our language introduces a substantial level of complexity in both the language and its implementation. A common technique for mitigating complexity in language implementation is to use intermediate representations (IRs), or mini-languages internal to the interpreter. For our language $\lamarr$, we have the top-level language defined by what the programmer will write as well an IR for our typechecker and interpreter.

First, we will examine the grammar for our new language. Specifically, we have added new types: $\ttt{Product}, \ttt{Sum}, \ttt{ForAll}, \ttt{Exists}$ and the corresponding terms necessary to implement those types. These are precisely the same formulations as we discussed in class. Additionally, this language introduces pattern matching, a restricted version of the match clause you’ve used in OCaml.

\begin{alignat*}{4} \tsf{Type}~\tau \Coloneqq \qamp \ttt{Var}(X) \qqamp X \qqamp \text{variable} \\ \qamp \ttt{Int} \qqamp \ttt{int} \qqamp \text{integer}\\ \qamp \ttt{Fn}(\tau_1, \tau_2) \qqamp \tau_1 \rightarrow \tau_2 \qqamp \text{function} \\ \qamp \ttt{Product}(\tau_1, \tau_2) \qqamp \tau_1 * \tau_2 \qqamp \text{product} \\ \qamp \ttt{Sum}(\tau_1, \tau_2) \qqamp \tau_1 + \tau_2 \qqamp \text{sum} \\ \qamp \ttt{ForAll}(X, \tau) \qqamp \forall X.\tau \qqamp \text{forall} \\ \qamp \ttt{Exists}(X, \tau) \qqamp \exists X.\tau \qqamp \text{exists} \\ \\ \tsf{Term}~t \Coloneqq \qamp \ttt{Var}(x) \qqamp x \qqamp \text{variable} \\ \qamp \ttt{Int}(n) \qqamp n \qqamp \text{integer} \\ \qamp \ttt{Binop}(\oplus, t_1, t_2) \qqamp t_1 \oplus t_2 \qqamp \text{binary operation} \\ \qamp \ttt{Lam}(x, \tau, t) \qqamp \ttt{fn} \ (x : \tau) \ . \ t \qqamp \text{function} \\ \qamp \ttt{App}(t_1, t_2) \qqamp t_1 \ t_2 \qqamp \text{application} \\ \qamp \ttt{Pair}(t_1, t_2) \qqamp (t_1, t_2) \qqamp \text{tuple} \\ \qamp \ttt{Project}(t, d) \qqamp t.d \qqamp \text{projection} \\ \qamp \ttt{Inject}(t, d, \tau) \qqamp \ttt{inj} \ t = d \ \ttt{as} \ \tau \qqamp \text{injection} \\ \qamp \ttt{TLam}(X, t) \qqamp \ttt{tfn} \ X \ . \ t \qqamp \text{type function} \\ \qamp \ttt{TApp}(t, \tau) \qqamp t \ [\tau] \qqamp \text{type application} \\ \qamp \ttt{TPack}(\tau_1, t, \tau_2) \qqamp \{ \tau_1, t \} \ \ttt{as} \ \tau_2 \qqamp \text{type abstraction} \\ \qamp \ttt{Let}(\rho, t_1, t_2) \qqamp \ttt{let} \ \rho = t_1 \ \ttt{in} \ t_2 \qqamp \text{let} \\ \qamp \ttt{Match}(t, (\rho_1, t_1), (\rho_2, t_2)) \qqamp \ttt{match} \ t \ \{ \ \rho_1 \hookrightarrow t_1 \mid \rho_2 \hookrightarrow t_2 \ \} \qqamp \text{match} \\ \\ \tsf{Pattern}~\rho \Coloneqq \qamp \ttt{Wildcard} \qqamp \text{--} \qqamp \text{wildcard} \\ \qamp \ttt{Var}(x, \tau) \qqamp x : \tau \qqamp \text{variable binding} \\ \qamp \ttt{Alias}(\rho, x, \tau) \qqamp \rho \ \ttt{as} \ x : \tau \qqamp \text{alias} \\ \qamp \ttt{Pair}(\rho_1,\rho_2) \qqamp (\rho_1, \rho_2) \qqamp \text{tuple} \\ \qamp \ttt{TUnpack}(X, x) \qqamp \{X, x\} \qqamp \text{type unpack} \\ \\ \tsf{Binop}~\oplus ::= \qamp \ttt{Add} \qqamp + \qqamp \text{addition} \\ \qamp \ttt{Sub} \qqamp - \qqamp \text{subtraction} \\ \qamp \ttt{Mul} \qqamp * \qqamp \text{multiplication} \\ \qamp \ttt{Div} \qqamp / \qqamp \text{division} \\ \\ \tsf{Direction}~d ::= \qamp \ttt{Left} \qqamp L \\ \qamp \ttt{Right} \qqamp R \end{alignat*}

As a refresher from lecture, $\ttt{Inject}$ “injects” the term $t$ as the left or right side of the sum type $\tau$, depending on what $d$ is. $\ttt{TLam}$ and $\ttt{TApp}$ are similar to $\ttt{Lam}$ and $\ttt{App}$ except that they operate on types, and not terms. That means $\ttt{TLam}$ maps a variable name to a type $\tau$ and not a term $t$. $\ttt{TApp}$ expects a type $\tau$ as an argument for some $\ttt{TLam}$.

$\ttt{TPack}$, $\ttt{Exists}$, and $\ttt{TUnpack}$ are intimately related. $\ttt{TPack}$ is always used with the type $\ttt{Exists}$. $\ttt{Exists}$ states that there exists a type $X$ that looks like the type $\tau$. This essentially allows us to define abstract interfaces. $\ttt{TPack}$, “packs” a concrete implementation of some type $\tau_1$ into some existential type $\tau_2$. $\ttt{TUnpack}$ allows us to retrieve the concrete implementation.

As mentioned, all of the rules for algebraic data types and first order type logic mirror the semantics discussed in class, so refer back to those notes for more details. The pattern matching mechanism is novel, however, as this is the first time we will give it a formal treatment. Here’s a few examples using patterns in this language:

If we wanted to, we could stop here and define a full statics and semantics for the language above. However, we need not to, since we can translate some parts of the language into other. Specifically, we will eliminate pattern matching in a translation pass. For example, we could define the following translation of a tuple match (where $\leadsto$ means “translates to”):

To formalize this translation, we will first define a grammar for a new IR:

\begin{alignat*}{4} \tsf{Type}~\tau \Coloneqq \qamp \ttt{Var}(X) \qqamp X \qqamp \text{variable} \\ \qamp \ttt{Int} \qqamp \ttt{int} \qqamp \text{integer}\\ \qamp \ttt{Fn}(\tau_1, \tau_2) \qqamp \tau_1 \rightarrow \tau_2 \qqamp \text{function} \\ \qamp \ttt{Product}(\tau_1, \tau_2) \qqamp \tau_1 * \tau_2 \qqamp \text{product} \\ \qamp \ttt{Sum}(\tau_1, \tau_2) \qqamp \tau_1 + \tau_2 \qqamp \text{sum} \\ \qamp \ttt{ForAll}(X, \tau) \qqamp \forall X.\tau \qqamp \text{forall} \\ \qamp \ttt{Exists}(X, \tau) \qqamp \exists X.\tau \qqamp \text{exists} \\ \\ \tsf{Term}~t \Coloneqq \qamp \ttt{Var}(x) \qqamp x \qqamp \text{variable} \\ \qamp \ttt{Int}(n) \qqamp n \qqamp \text{integer} \\ \qamp \ttt{Binop}(\oplus, t_1, t_2) \qqamp t_1 \oplus t_2 \qqamp \text{binary operation} \\ \qamp \ttt{Lam}(x, \tau, t) \qqamp \ttt{fn} \ (x : \tau) \ . \ t \qqamp \text{function} \\ \qamp \ttt{App}(t_1, t_2) \qqamp t_1 \ t_2 \qqamp \text{application} \\ \qamp \ttt{Pair}(t_1, t_2) \qqamp (t_1, t_2) \qqamp \text{tuple} \\ \qamp \ttt{Project}(t, d) \qqamp t.d \qqamp \text{projection} \\ \qamp \ttt{Inject}(t, d, \tau) \qqamp \ttt{inj} \ t = d \ \ttt{as} \ \tau \qqamp \text{injection} \\ \qamp \ttt{Case}(t, (x_1, t_1), (x_2, t_2)) \qqamp \ttt{case} \ t \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \} \qqamp \text{case} \\ \qamp \ttt{TLam}(X, t) \qqamp \ttt{tfn} \ X \ . \ t \qqamp \text{type function} \\ \qamp \ttt{TApp}(t, \tau) \qqamp t \ [\tau] \qqamp \text{type application} \\ \qamp \ttt{TPack}(\tau_1, t, \tau_2) \qqamp \{\tau_1, t\} \ \ttt{as} \ \tau_2 \qqamp \text{type abstraction} \\ \qamp \ttt{TUnpack}(X, x, t_1, t_2) \qqamp \ttt{unpack} \ \{X, x\} = t_1 \ \ttt{in} \ t_2 \qqamp \text{type open} \\ \\ \tsf{Binop}~\oplus ::= \qamp \ttt{Add} \qqamp + \qqamp \text{addition} \\ \qamp \ttt{Sub} \qqamp - \qqamp \text{subtraction} \\ \qamp \ttt{Mul} \qqamp * \qqamp \text{multiplication} \\ \qamp \ttt{Div} \qqamp / \qqamp \text{division} \\ \\ \tsf{Direction}~d ::= \qamp \ttt{Left} \qqamp L \\ \qamp \ttt{Right} \qqamp R \end{alignat*}

This IR has three differences from the top-level:

1. Patterns are removed entirely.
2. TUnpack has moved from Pattern to Term.
3. Match has been replaced by Case.

In terms of the code, the first grammar corresponds to the Lang module in ast.ml, whereas the second corresponds to the IR module.

Part 1: Translator

Your first job is to implement the translation function in translator.ml:

val translate : Lang.Term.t -> IR.Term.t


This function takes a term of the top-level language Lang.Term.t and converts it into an IR.Term.t. It cannot fail, so we do not return a Result.t.

The translation is largely trivial for most of the language as the only major difference is the Lang.Term.Let and Lang.Term.Match cases. We have implemented all of the trivial cases as well the Match case for you.

We can formalize the Let and Match translations by defining a series of translation rules:

$$\ir{Tr-let_1}{\trans{t_2}{t_2'}}{\trans{\lett{\text{--}}{t_1}{t_2}}{t_2'}} \s \ir{Tr-let_2} {\trans{t_1}{t_1'} \s \trans{t_2}{t_2'}} {\trans{\lett{\hasType{x}{\tau}}{t_1}{t_2}}{\app{(\fnt{x}{\tau}{t_2'})}{t_1'}}} \nl \ir{Tr-let_3} {\trans{t_1}{t_1'} \s \trans{\lett{\rho}{t_1}{t_2}}{t'}} {\trans{\lett{\rho \ \ttt{as} \ x : \tau}{t_1}{t_2}}{\app{(\fnt{x}{\tau}{t'})}{t_1'}}} \s \ir{Tr-let_4} {\trans{(\lett{\rho_1}{t_1.L}{\lett{\rho_2}{t_1.R}{t_2}})}{t'}} {\trans{\lett{(\rho_1, \rho_2)}{t_1}{t_2}}{t'}} \nl \ir{Tr-let_5} {\trans{t_1}{t_1'} \s \trans{t_2}{t_2'}} {\trans{\lett{\{X, x\}}{t_1}{t_2}}{\ttt{unpack} \ \{X, x\} = t_1' \ \ttt{in} \ t_2'}} \nl \ir{Tr-match} {\trans{t}{t'} \s \trans{\lett{\rho_1}{x_1}{t_1}}{t_1'} \s \trans{\lett{\rho_n}{x_2}{t_2}}{t_2'}} {\trans {\ttt{match} \ t \ \{ \ \rho_1 \hookrightarrow t_1 \mid \rho_2 \hookrightarrow t_2 \ \}} {\ttt{case} \ t' \ \{ \ x_1 \hookrightarrow t_1' \mid x_2 \hookrightarrow t_2' \ \}}}$$

Look at Match for an example of implementing one of these translation rules. After implementing these rules, you will have a complete pipeline to move from source code to your IR.

Note: if we wanted to formally verify our translation rules, we could define a dynamics for both the toplevel Lang and the IR, and then prove that for all $\hasType{t}{\tau}$ that if $t_L \Rightarrow t_L'$ in Lang and $t_L \leadsto t_{IR}$ then $t_{IR} \Rightarrow t_{IR}'$ implies that $t_L' \leadsto t_{IR}'$.

Part 2: Typechecker

Your next task is to implement a typechecker for the IR. Because we decided to translate before typechecking, this will reduce the complexity of the typechecker (although this will obscure type errors–it’s a tricky business to match a type error on generated code back to the original source).

The format of the typechecker is the mostly same as the previous assignment, but now the type system is more complex. The statics are:

$$\ir{T-var}{ \ }{\typeJ{\ctx,\hasType{x}{\tau}}{x}{\tau}} \s \ir{T-n}{ \ }{\hasType{\num{n}}{\ttt{int}}} \s \ir{T-fn} {\typeJ{\ctx,\hasType{x}{\tau_1}}{t}{\tau_2}} {\typeJC{(\ttt{fn} \ (x : \tau_1) \ . \ t)}{\tau_1 \rightarrow \tau_2}} \nl \ir{T-app} {\typeJC{t_1}{\tau_1 \rightarrow \tau_2} \s \typeJC{t_2}{\tau_1}} {\typeJC{(t_1 \ t_2)}{\tau_2}} \s \ir{T-tuple} {\typeJC{t_1}{\tau_1} \s \typeJC{t_2}{\tau_2}} {\typeJC{(t_1, t_2)}{\tau_1 * \tau_2}} \nl \ir{T-project-L} {\typeJC{t}{\tau_1 \times \tau_2}} {\typeJC{t.L}{\tau_1}} \s \ir{T-project-R} {\typeJC{t}{\tau_1 \times \tau_2}} {\typeJC{t.R}{\tau_2}} \nl \ir{T-inject-L} {\typeJC{t}{\tau_1}} {\typeJC{\ttt{inj} \ t = L \ \ttt{as} \ \tau_1 + \tau_2}{\tau_1 + \tau_2}} \s \ir{T-inject-R} {\typeJC{t}{\tau_2}} {\typeJC{\ttt{inj} \ t = R \ \ttt{as} \ \tau_1 + \tau_2}{\tau_1 + \tau_2}} \nl \ir{T-case} {\typeJC{t}{\tau_1 + \tau_2} \s \typeJ{\ctx,\hasType{x_1}{\tau_1}}{t_1}{\tau} \s \typeJ{\ctx,\hasType{x_2}{\tau_2}}{t_2}{\tau}} {\typeJC{\ttt{case} \ t \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}}{\tau}} \nl \ir{T-tfn} {\typeJ{\ctx,X}{t}{\tau}} {\typeJC{\ttt{tfn} \ X \ . \ t}{\forall X.\tau}} \s \ir{T-tapp} {\typeJC{t}{\forall X.\tau_1}} {\typeJC{t \ [\tau_2]}{\subst{X}{\tau_2}{\tau_1}}} \s \ir{T-pack} {\typeJC{t}{\subst{X}{\tau_1}{\tau_2}}} {\typeJC{\{\tau_1, t\} \ \ttt{as} \ \exists X.\tau_2}{\exists X.\tau_2}} \nl \ir{T-unpack} {\typeJC{t_1}{\exists X.\tau_1} \s \typeJ{\ctx,X,\hasType{x}{\tau_1}}{t_2}{\tau_2}} {\typeJC{\ttt{unpack} \ \{X, x\} = t_1 \ \ttt{in} \ t_2}{\tau_2}}$$

There are two major additions: sum/product types and polymorphic/existential types. Typechecking the former is straightforwardly derived from the rules like from the previous assignment, but the latter introduces a new concept: type variables. Now, your typing context keeps track of two things:

1. Mappings from term variables to types, notated by $\typeJ{\Gamma,\hasType{x}{\tau}}{e}{\tau}$. This is as before used for when term variables are introduced by lambdas, or here also case expressions.
2. Existence of type variables, notated by $\typeJ{\Gamma,X}{e}{\tau}$. Type variables do not map to anything, the context just tracks when a type variable is in scope (to catch type expressions that use an undefined type variable).

Concretely, you will implement the following function:

val typecheck_term : String.Set.t -> Type.t String.Map.t -> Term.t -> Type.t


Where typecheck_term type_env env t returns the type of t if it exists. Note that instead of using a Result.t, this time you can use a simpler approach: whenever a type error occurs, you should raises a TypeError exception, e.g. raise (TypError "Does not typecheck"), with the appropriate error message. Otherwise, assume that when you call typecheck_term recursively you get back a valid type.

Here, env is the same mapping from variables to types as in the previous assignment, but we have added tenv which is a set of type variables. Type variables do not map to anything (they do not themselves have types), but instead just provide the context that a type variable is in scope during typechecking of an expression.

We have provided you a function typecheck_type : String.Set.t -> Type.t -> Type.t. Whenever you’re typechecking a term that contains a type specified by the user, for example the type annotation on a function or an injection, you will need to check that the provided type does not use unbound type variables. For example, the following expression should not typecheck because $Y$ is unbound:

Lastly, during typechecking, there will be points where you will want to compare the equality of two types. Do not use the equals operator (e.g. tau1 = tau2) as that would be incorrect! Use the Type.aequiv function that compares two types for equality up to renaming of bound variables, or check them for alpha-equivalence. For example, the types $\forall X.X$ and $\forall Y.Y$ should be considered equal, however the = operator will return false because the variable names are different. Type.aequiv has the signature Type.t -> Type.t -> bool.

Note: Technically, it’s possible for match statements to introduce variables that are already free variables in the expression, which is not legal. However, for the sake of this assignment, assume that match statements will always use fresh variables. That is, variables introduced in cases will not be found elsewhere in the expression.

Part 3: Interpeter

Lastly, you must implement the interpreter. At this point, all of the terms relating to type abstraction (polymorphic/existential types) have trivial dynamics, since we’ve already verified our required type properties in the typechecker. Most of the legwork is in supporting algebraic data types. The dynamics are as follows:

$$\ir{V-n}{ \ }{\val{\num{n}}} \s \ir{V-fn}{ \ }{\val{(\ttt{fn} \ (x : \tau_1) \ . \ e)}} \s \ir{D-app_1}{\steps{t_1}{t_1'}}{\steps{(t_1 \ t_2)}{(t_1' \ t_2)}} \s \ir{D-app_2}{\val{t_1} \s \steps{t_2}{t_2'}}{\steps{(t_1 \ t_2)}{(t_1 \ t_2')}} \nl \ir{D-app_3}{\val{t_2}}{\steps{((\ttt{fn} \ (x : \tau) \ . \ t_1) \ t_2)}{\subst{x}{t_2}{t_1}}} \s \ir{D-binop_1}{\steps{t_1}{t_1'}}{\steps{t_1 \oplus t_2}{t_1' \oplus t_2}} \s \ir{D-binop_2}{\val{t_1} \s \steps{t_2}{t_2'}}{\steps{t_1 \oplus t_2}{t_1 \oplus t_2'}} \nl \ir{D-div}{\ }{\error{t_1 \ / \ \ttt{Int}(0)}} \s \ir{D-binop_3}{\ }{\steps{\ttt{Int}(n_1) \oplus \ttt{Int}(n_2)}{\ttt{Int}(n_1 \oplus n_2)}} \nl \ir{D-tuple_1}{\steps{t_1}{t_1'}}{\steps{(t_1, t_2)}{(t_1', t_2)}} \s \ir{D-tuple_2}{\val{t_1} \s \steps{t_2}{t_2'}}{\steps{(t_1, t_2)}{(t_1, t_2')}} \s \ir{D-tuple_3}{\val{t_1} \s \val{t_2}}{\val{(t_1, t_2)}} \nl \ir{D-project_1}{\steps{t}{t'}}{\steps{t.d}{t'.d}} \s \ir{D-project_2}{\val{(t_1, t_2)}}{\steps{(t_1, t_2).L}{t_1}} \s \ir{D-project_3}{\val{(t_1, t_2)}}{\steps{(t_1, t_2).R}{t_2}} \nl \ir{D-inject_1} {\steps{t}{t'}} {\steps{\ttt{inj} \ t = d \ \ttt{as} \ \tau}{\ttt{inj} \ t' = d \ \ttt{as} \ \tau}} \s \ir{D-inject_2}{\val{t}}{\val{\ttt{inj} \ t = d \ \ttt{as} \ \tau}} \nl \ir{D-case_1} {\steps{t}{t'}} {\steps {\ttt{case} \ t \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}} {\ttt{case} \ t' \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}}} \nl \ir{D-case_2} {\val{t}} {\steps {\ttt{case} \ \ttt{inj} \ t = L \ \ttt{as} \ \tau \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}} {\subst{x_1}{t}{t_1}}} \nl \ir{D-case_3} {\val{t}} {\steps {\ttt{case} \ \ttt{inj} \ t = R \ \ttt{as} \ \tau \ \{ \ x_1 \hookrightarrow t_1 \mid x_2 \hookrightarrow t_2 \ \}} {\subst{x_2}{t}{t_2}}} \nl \ir{D-tfn}{\ }{\steps{\ttt{tfn} \ X \ . \ t}{t}} \s \ir{D-tapp}{\ }{\steps{t \ [\tau]}{t}} \s \ir{D-pack}{\ }{\steps{\{\tau_1, t\} \ \ttt{as} \ \tau_2}{t}} \nl \ir{D-unpack}{\ }{\steps{\ttt{unpack} \ \{X, x\} = t_1 \ \ttt{in} \ t_2}{\subst{x}{t_1}{t_2}}}$$

As in the previous assignment, you will implement these dynamics in the trystep function in interpreter.ml:

val trystep : Term.t -> outcome


The dynamics for Int and Var as well as the dynamics for the polymorphic/existential cases are done for you.

Testing

To build the interpreter, run make. This will create an executable, main.byte. This creates OCaml bytecode that needs to passed to ocamlrun to execute. We’ve provided two scripts user.sh and solution.sh for main.byte and solution.byte respectively.

Running run_tests.py will run the test suite over all the files in the tests directory and compare them to our solution.

python3 run_tests.py


Note that the tests are just a representative sample of the tests we will run your code on.

To test an individual file, you can use user.sh to invoke the interpreter manually, e.g.

./user.sh tests/exists1.lam2


For each of the functions you have to implement, we have provided a few additional unit tests inside of an inline_tests function in each file. Uncomment the line that says:

let () = inline_tests ()


And then execute ./user.sh to run those tests. If it does not fail with an assertion error, you have passed those tests.