Due: Wednesday, October 2 at 4:00pm
Submission cutoff: Saturday, October 5 at 4:00pm


On Wednesday, we learned about the basics of how to formally describe and reason about syntax and semantics. In this assignment, you will enter a dark corner of computer science devoid of such logic or reason: data description languages. Specifically, we will look at using schema and accessors to safely validate and access JSON objects.

## Setup

For each assignment, the writeup will be hosted on the course web site (as with this page), and any material such as starter code or LaTeX templates will be provided in the course repository: https://github.com/stanford-cs242/f19-assignments

To get started, install Git if you haven’t already, then clone the repository:

$git clone https://github.com/stanford-cs242/f19-assignments$ cd f19-assignments


For all future assignments, you’ll just need to run git pull once the assignment is released. Each assignment will have an appropriately named directory—this one is assign1. This assignment has only a written portion, for which you must create a PDF file containing your answers.

In assign1/written we have provided you a LaTeX template which you are heavily encouraged to use. If you are unfamiliar and do not wish to learn LaTeX, then you may also hand-write your assignment. Your handwriting must be legible, and the scan must be clear.

## 0. JSON

Formally, a JSON object is a purely declarative object (i.e. fully reduced, no computation) that we can inductively define as follows.

For example, the following are JSON objects:

•   $1$
•   $\jobj{\jstr{x}: \jstr{hello}}$
•   $\jarr{\truet, \falset, \truet}$

A few notes on grammar notation. First, the metavariable $s$ ranges over strings, like "hello world" and "foobar". Second, like in a regular expression, the star indicates zero or more occurrences of a syntax object. In any rules or proofs, you can treat identifiers within a star as keyed sets or ordered arrays as needed. The following shorthands are all acceptable:

• Accessing key $s'$ on an dictionary $\jobj{(s: j)^*}$
• Pattern matching: $\jobj{s': j', \ldots}$
• Indexing: $j_{s'}$
• Accessing index $i$ on an array $\jarr{j^*}$
• Pattern matching: $\jarr{\ldots, j_i, \ldots}$
• Indexing: $j_i$

For example, the following are hypothetical step operations that equivalently extract the key “foo” out of a JSON dictionary:

For rules that need to quantify universal properties over sets/arrays, you can use the standard forall syntax:

## 1. JSON schemas

After receiving too many malformed JSON objects, Will wants to devise a schema language that lets him express the expected shape/properties of a JSON object. The schema should have a corresponding judgment $j \sim \tau$ that expresses whether a JSON object $j$ adheres to a schema $\tau$. (To draw an analogy to a standard programming language, a JSON object is like an expression, and a schema is like a type.) Here are a few examples he sketched out:

1.   $\tau = \tnum \prop{(\pgt{0}~\wedge \plt{5})~\vee \peq{10}}$
•   $\matches{1}{\tau}$
•   $\matches{4}{\tau}$
•   $\matches{10}{\tau}$
•   $\nmatches{0}{\tau}$
•   $\nmatches{11}{\tau}$
2.   $\tau = \jobj{\jstr{a}: \tbool, \jstr{b}: \tstr\prop{\peq{\jstr{X}}~\vee\peq{\jstr{Y}}}}$
•   $\matches{\jobj{\jstr{a}: \truet, \jstr{b}: \jstr{X}}}{\tau}$
•   $\matches{\jobj{\jstr{a}: \falset, \jstr{b}: \jstr{Y}}}{\tau}$
•   $\nmatches{\jobj{\jstr{a}: 0, \jstr{b}: \jstr{X}}}{\tau}$
•   $\nmatches{\jobj{\jstr{a}: \falset, \jstr{b}: \jstr{X}, \jstr{c}: 0}}{\tau}$
•   $\nmatches{\jobj{\jstr{a}: \falset}}{\tau}$
•   $\nmatches{\jobj{\jstr{a}: \truet, \jstr{b}: \jstr{Z}}}{\tau}$
3.   $\tau = \jarr{\jobj{\jstr{x}: \tnum\prop{\varepsilon}}}$
•   $\matches{[]}{\tau}$
•   $\matches{\jarr{\jobj{\jstr{x}: 0}, \jobj{\jstr{x}: 1}}}{\tau}$
•   $\nmatches{\jobj{\jstr{x}: 0}}{\tau}$
•   $\nmatches{\jarr{\jobj{\jstr{x}: 0}, \jobj{\jstr{y}: 0}}}{\tau}$

### 1.1 Schema grammar (15%)

Your first task is to formalize these prototypes into a grammar describing the structure of the schema language. Your grammar should be concise, composable, and fully capture the feature set prototyped above. A few notes:

• Only numbers and strings have properties, not booleans, arrays, or dictionaries.
• $\varepsilon$ means the empty property, or match any value.
• Lists can only have elements of a single schema (similar to how lists in most programming languages have a type List<T>, not a type List<T1, T2, ...>), while each key of an dictionary can correspond to a different schema (similar to how structs have struct {k1: int, k2: bool, ...}).
• We have provided you LaTeX macros that encode the syntax above at the top of the starter template.

To get you started, you can build off this skeleton:

If you’re not sure how to get started, here’s an alternative example to get you thinking. If I showed you the following strings as an example of language:

•   $1 * 2 + 3$
•   $\left\lceil \frac{5}{2} \right\rceil * 4$
•   $\frac{1}{2^3}$

Then you could reconstruct a concise, composable grammar that captures the underlying structure of these examples.

### 1.2 Schema validator (25%)

Next, you will define the judgment $j \sim \tau$ that defines whether a JSON object $j$ adheres to a schema $\tau$ from your language in 1.1. If JSON is like an expression, and a schema is like a type, then this relation is like a typechecker. Your solution will use the inference rule format as shown in class. For example, the rules for booleans:

These rules say, unconditionally, the expressions $\truet$ and $\falset$ match the schema $\tbool$.

When checking number and string properties, you may assume your rules have access to a theory of arithmetic including a total order on the integers, and a theory of strings including string equality. The reference solution consists of 13 inference rules.

## 2. JSON accessors

Inspired by the eternally useful jq tool, Will wants to build a safe accessor language on top of the schema language. An accessor represents a declarative specification of an element to retrieve in a JSON object, for example .x[1].y on the object {'x': [{'y': 0}, {'y': 2}]} would return 2. Putting in a little more effort this time, Will writes down a grammar for his accessors:

Note: when we write in concrete syntax an accessor like $a = .\jstr{x}$, there is an implicit $\varepsilon$ at the end, so this comes from the grammar pattern $.s\,a$ where $s = \jstr{x}$ and $a = \varepsilon$.

Your first task is to define a small-step operational semantics for pairs of accessors and JSON objects. For example, your semantics should prove:

Formally, a step is defined as $\steps{(a, j)}{(a', j')}$ between an accessor $a$ and JSON object $j$ to a new accessor $a'$ and new JSON object $j'$. The termination state is always $(\varepsilon, j)$ for some $j$.

### 2.1 Operational semantics (15%)

Define an operational semantics for accessors. An English specification for each operation:

• $\varepsilon$ returns the object $j$ unchanged, e.g. $(\varepsilon, j)$ does not step and is a final state.
• $.s\,a$ returns the value for the key $s$ on the dictionary $j$, e.g. $\steps{(.\jstr{x}[1], \jobj{\jstr{x}: [1, 2]})}{([1], [1, 2])}$
• $[n]\,a$ returns the $n$-th index in the array $j$, e.g. $\steps{([1], [1, 2])}{(\varepsilon, 2)}$
• $\mid a$ maps an accessor over an array, e.g. $\evals{(.\jstr{x} \mid\,.\jstr{y}, \jobj{\jstr{x}: [\jobj{\jstr{y}: 5}, \jobj{\jstr{y}: 6}]})}{(\varepsilon, [5, 6])}$

The reference solution contains four (not 3!) inference rules. Your rules should not use the zero-or-more-steps rule $\overset{*}{\mapsto}$, only single steps.

### 2.2 Accessor safety (45%)

Finally, your task is to build a bridge between the schema and accessor languages. Specifically, we want to make a judgment $\accmatches{a}{\tau}$ that defines whether an accessor is “valid” with respect to a schema $\tau$. For example, it should hold that $\accmatches{.\jstr{x}[0]}{\jobj{\jstr{x}: \jarr{\tnum\prop{\varepsilon}}}}$. Rather than relying on examples to define the correctness of our judgment, we can now provide a formal definition of accessor safety:

Accessor safety: For all $a, j, \tau$, if $\accmatches{a}{\tau}$ and $\matches{j}{\tau}$, then there exists a $j'$ such that $\evals{(a, j)}{(\varepsilon, j')}$.

This says, if both an accessor and a JSON object validate against a schema, then the accessor should terminate when evaluated on the object. (All the properties the accessor expects should be on the object.)

Your task is to both define the accessor validation judgment $\accmatches{a}{\tau}$, as well as provide an inductive proof of the accessor safety theorem.

First, please ensure that your proof is as simple and rigorous as possible. PL kind of proofs are often quite mechanical—you have a very limited logical system, it’s just a matter of putting the puzzle pieces together in the right way. Write out all your assumptions (especially the inductive hypothesis) to keep track of what you know. We encourage the use of bullet points with one logical leap per bullet. We will penalize overly complex proofs and unstated assumptions.

Second, we have provided a small example demonstrating the use of the various needed LaTeX commands. Please use them as appropriate.

Third, to make your life easy, you may assume an array access will always be inbounds.

Finally, an extended note on inversion lemmas. If you have a logical system, e.g.

Then if you assume you have a proof that $x \sim C$ for some $x$, you know two things: first, $x = A$, since the only way to derive a proof of anything related to $C$ is through $A$. Second, you know that proof must have been derived knowing $A \sim B$, so you can assume that to be true as well. More generally, inversion says: I have enough information about this assumed fact to deduce information about how it was proved. So from $x \sim C$ you can use $A \sim C$ and $A \sim B$ in your proof.

You may assume any (true) inversion lemma without explicit proof, but carefully cite which rule is being inverted.