CS 242: Stack machines and assembly

To this point, we have discussed a particular model of computation, the many variants of lambda calculus. Lambda calculus is useful as a computational model for its simplicity: it has very few moving parts, no separate runtime stack/heap, and so can be formalized without significant complexity. However, simplicity in the language doesn’t mean simplicity in usage, just simplicity in analysis.

Moreover, mathematics isn’t what makes programming languages run in the real world—physics does. By the laws of electricity and subsequently analog and digital circuit design, there are certain kinds of computations that are easier than others to represent with a physical computing apparatus. This is why assembly (e.g. x86, ARM, etc.) looks so different from lambda calculus, because assembly was designed with hardware in mind, while lambda calculus was designed with mathematics in mind. Assembly is a computer-oriented language, while lambda calculus is a computation-oriented language.

This means that bringing our PL toolkit to assembly languages, or more broadly low-level languages, will require us to formally model systems that look closer to how a modern computer actually works. At this point in the course, we’re shifting gears to look at how we can apply the ideas you learned in the last three weeks to low-level programming¹. We’ll do that in two ways: first, you’ll learn about WebAssembly and how we can model typed assembly languages. Then you’ll learn (at a higher/less-formal level) about Rust, and how it uses novel type theories and functional programming patterns to improve correctness and safety in low-level code.

As with the lambda calculus/OCaml, we’re looking at WebAssembly for two reasons: first, to ask the question of how ideas in PL theory can improve the practice of everyday languages. For example, algebraic data types are both a useful construct theoretically, but also practically implement safe error handling. For WebAssembly, we’re asking: what are issues in today’s assembly languages, and how can we improve them? And second, we want to understand how to take a programming language and formalize its semantics. This improves our ability to reason about the language and communicate it to others.

History of WebAssembly

The origin and motivation of WebAssembly is described well in the introduction of Bringing the Web up to Speed with WebAssembly, a PLDI’17 paper jointly authored by employees at Google, Microsoft, Mozilla, and Apple that laid out the design principles for the language. I will reproduce a section of it here:

The Web began as a simple document exchange network but has now become the most ubiquitous application platform ever, accessible across a vast array of operating systems and device types. By historical accident, JavaScript is the only natively supported programming language on the Web, its widespread usage unmatched by other technologies available only via plugins like ActiveX, Java or Flash. Because of JavaScript’s ubiquity, rapid performance improvements in modern VMs, and perhaps through sheer necessity, it has become a compilation target for other languages. Through Emscripten [43], even C and C++ programs can be compiled to a stylized low-level subset of JavaScript called asm.js [4]. Yet JavaScript has inconsistent performance and a number of other pitfalls, especially as a compilation target. WebAssembly addresses the problem of safe, fast, portable low-level code on the Web. Previous attempts at solving it, from ActiveX to Native Client to asm.js, have fallen short of properties that a low-level compilation target should have:

Safe, fast, and portable semantics:

safe to execute

fast to execute

language-, hardware-, and platform-independent

deterministic and easy to reason about

simple interoperability with the Web platform

Safe and efficient representation:

compact and easy to decode

easy to validate and compile

easy to generate for producers

streamable and parallelizable

The history here is quite fascinating. Essentially, as websites became increasingly complex (moving from web “sites” to web “apps”), browsers needed to faster and faster execution for Javascript. The Javascript V8 engine is probably the most scripting language execution engine in history, considering Google has poured millions of people-hours into optimizing every last corner of Javascript.

Still, scripting languages are fundamentally slow for a number of reasons (dynamic type checks, garbage collections, unpredictable interactions with the JIT compiler, …), so in 2011-12 a group of devs at Mozilla carved out a subset of Javascript called asm.js that could be statically typed and compiled/executed similar to C/C++. This led to the development of Emscripten, the world’s premier C++ to Javascript compiler. In 2013, Mozilla partnered with Epic to compile the Unreal Engine into the asm.js-subset of Javascript. Sadly, the original demo seems to have been lost to time, but you can still find other demos from around then. However, realizing the inanity of these solutions, the browser vendors managed to join forces and collaboratively design a “portable low-level bytecode”, WebAssembly.

Today, my goal is to informally introduce you to WebAssembly. You should get a sense of what the language is, how it works, and how you could program it. Next time, we’ll see how to formalize its semantics.

Stack machines

At its core, WebAssembly embodies a stack machine model of computation, where at runtime we walk through a program that modifies a separate stack of values. Here’s an example of a WebAssembly function that computes the quantity $f(x) = 2x^2 + 1$ .

(func $f (param $x i32) (result i32)
                 ;; stack: []
  (get_local $x) ;; stack: [x]
  (get_local $x) ;; stack: [x, x]
  (i32.mul)      ;; stack: [x*x]

  (i32.const 2)  ;; stack: [2, x*x]
  (i32.mul)      ;; stack: [2*x*x]

  (i32.const 1)  ;; stack: [1, 2*x*x]
  (i32.add))     ;; stack: [2*x*x+1]

A number of things to note about our first WebAssembly program. First, the syntax: this is the WebAssembly text format, which is derived from S-expressions, the same format commonly used for the Lisp programming language and its many flavors. An S-expression is either an “atom” (e.g. get_local, 2, or $x), or a list of S-expressions surrounded by parentheses (e.g. (get_local $x), (param $x i32)). In composite expressions, usually the first element is an atom that indicates the kind of thing being represented, and the remaining elements are parameters of that thing. For example, get_local and i32.mul are S-expressions representing instructions.

Most instructions in WebAssembly modify the value stack in some way. In the function above, get_local pushes the parameter x onto the stack. i32.mul pops two values from the stack, multiplies them, and pushes the result back onto the stack. i32.const N pushes the value N onto the stack. The function implicitly returns the value at the top of the stack. Notationally, when writing the stack left to right, the leftmost value is the top of the stack. For operations that take multiple arguments, they are in reverse order as taken off the stack. For example:

(i32.const 1)
(i32.const 2)
(i32.sub)

This program computes $1 - 2$ . Note that while we’re consistently using i32, WebAssembly understands numeric types of different sizes, both integers (i8, i16, i32, i64, i128) and floats (f32, f64).

The assemblies you’ve seen before are likely register-based, i.e. they use a (probably small, fixed) set of registers which represent independent storage locations to hold data. For example, the above program in x86 would be:

f:
	movl	%edi, -4(%rbp) ; *(rbp-4) = x
	shll	$1, %edi       ; edi = edi << 1 (= edi * 2)
	imull	-4(%rbp), %edi ; edi = edi * *(rbp-4)
	addl	$1, %edi       ; edi = edi + 1
	movl	%edi, %eax     ; eax = edi
	retq                   ; return eax

Of course, x86 still has a stack (e.g. -4(%rbp)), but it uses that in tandem with registers.

Names and stores

WebAssembly has three places to store data: on the stack, in a function context, or in a single global memory. The first two you saw in the above program. Function parameters and explicitly defined “locals” are managed with the get_local and set_local commands. For example, the C function:

int f(int x) {
  int i = x + 1;
  return i/x;
}

Could be written in WebAssembly as:

(func $f (param $x i32) (local $i i32) (result i32)
  (get_local $x)
  (i32.const 1)
  (i32.add)
  (set_local $i)

  (get_local $x)
  (get_local $i)
  (i32.div_s))

Here, the syntax (local $i i32) declares a local variable within the function f. All names are prefixed with a dollar sign. WebAssembly programs can also access a byte-addressable memory, just as with regular assemblies. For example, the C function:

void incr(int* x) {
  *x = *x + 1;
}

Could be written in WebAssembly as:

(func $incr (param $x i32)
  (get_local $x)
  (i32.load)

  (i32.const 1)
  (i32.add)

  (get_local $x)
  (i32.store))

The load and store instructions read and write to memory. Like in normal assemblies, there is a single memory, so load/store are not parameterized by which memory they are accessing.

Control flow

Besides the value stack part, most of what you’ve seen isn’t too far from normal assemblies. Integer operations, memory, functions. For WebAssembly, the biggest departure from tradition is its mechanisms for control flow. In x86, control flow works by maintaining a program counter, essentially a pointer to the current instruction. By default, after executing an instruction, the counter increments by 1 to the next instruction, but jump instructions allow the program counter to be set an arbitrary value. Moreover, jumps can be indirect, or data-dependent, e.g. with virtual method tables in C++.

However, it turns out that arbitrary jumps are a bad idea. In fact, they are an unmitigated disaster. Dijkstra knew this back in 1968 when he wrote the classic Go To Statement Considered Harmful. Years of buffer overflows and ROP attacks have only confirmed his early intuition. Today, no higher-level language (even C) allows arbitrary jumps/gotos.

WebAssembly follows suit by choosing structured control flow, where the language only allows jumps to specific points in the program. For example, here’s a WebAssembly program that infinitely increments a counter:

(func $inf_incr (local $x i32)
  ;; x = 0
  (i32.const 0)
  (set_local $x)

  (loop $incr_loop
    ;; x++
    (get_local $x)
    (i32.const 1)
    (i32.add)
    (set_local $x)

    (br $incr_loop)))

Here, the loop construct defines a new label (or break point) $incr_loop which we can use the br instruction to unconditionally jump (or break) to. The label incr_loop is at the top of the loop, so jumping to it restarts the loop sequence. Unlike normal insructions, the loop construct is parameterized by a list of instructions. This is because the loop defines a label scope, i.e. a range within the $incr_loop label can be accessed. You can’t, for example, br $incr_loop outside of the loop.

The complementary control flow construct to loop is block, which defines a label at the end of an instruction sequence. For example, to emulate a C break statement:

(func $inf_incr (local $x i32)
  ;; x = 0
  (i32.const 0)
  (set_local $x)

  (block $incr_loop_break
    (loop $incr_loop
      ;; x++
      (get_local $x)
      (i32.const 1)
      (i32.add)
      (set_local $x)

      ;; if (x == 10) { break; }
      (get_local $x)
      (i32.const 10)
      (i32.eq)
      (br_if $incr_loop_break);

      (br $incr_loop))))

Note the use of br_if–this allows us to take conditional jumps. WebAssembly also comes with an if statement, but this is essentially syntactic sugar for a br_if with surrounding blocks.

The last main control flow structure is the function, including function calls and returns. Functions in WebAssembly are top-level constructs (no nested functions) that take multiple inputs and return a single output. As mentioned above, functions can also declare local variables only accessible inside the function. Here’s an example of a function call:

(func $incr (param $x i32) (result i32)
  (get_local $x)
  (i32.const 1)
  (i32.add))

(func $f (result i32)
  (i32.const 1)
  (call $incr))

Similar to x86 calling conventions (minus registers), arguments to a function are pushed onto the value stack. The top value on the stack becomes the first argument to the function. Note that although parameters are pushed onto the stack by the caller, the callee must access them through get_local just like local variables. A function’s return value is implicitly the value at the top of the stack when the function exits.

Anonymity and nested instructions

At this point, you’re now acquainted with the core features of WebAssembly. I want to briefly highlight two more concepts that you will find useful when writing and debugging WebAssembly programs. The first is anonymity: when we write parameters, locals, and loops, we will always use names to increase the clarity of the program. However, WebAssembly also supports anonymity through automatic numbering of otherwise named things. For example, these two functions are equivalent:

(func $foo1 (param $x i32) (param $y i32) (local $i i32)
  (block $loop_end
    (loop $loop_start
      (get_local $x)
      (get_local $y)
      (i32.add)
      (set_local $i)

      (get_local $i)
      (br_if $loop_end)

      (br $loop_start))))

(func $foo2 (param i32) (param i32) (local i32)
  (block
    (loop
      (get_local 0)
      (get_local 1)
      (i32.add)
      (set_local 2)

      (get_local 2)
      (br_if 1)

      (br 0))))

For parameters and locals, they are numbered left to right as in the function definition, e.g. x, y, i maps to 0, 1, 2. Labels are numbered in increasing order from the closest scope to the outermost scope, e.g. loop_start is 0 and loop_end is 1 in the inner scope in the example above. (Note that this is identical to the idea of de Bruijn indices for unambiguous representation of variables in the lambda calculus!)

Knowing the anonymous representation of variables is primarily useful for debugging tools, because (at least today) most browser debuggers for WebAssembly will strip the names and only leave you with numbers as above. Hence, knowing the mapping from names to numbers is necessary for effective debugging.

The second concept to know is instruction nesting. Above, the only WebAssembly constructs that supported nested instructions were control flow: functions, loops, and blocks. However, the textual format actually supports nesting more generally for instructions that use stack values as input. For example, these two functions are equivalent:

(func $foo1 (param $x i32) (param $y i32) (local $i i32)
  (block $loop_end
    (loop $loop_start
      (get_local $x)
      (get_local $y)
      (i32.add)
      (set_local $i)

      (get_local $i)
      (br_if $loop_end)

      (br $loop_start))))

(func $foo2 (param $x i32) (param $y i32) (local $i i32)
  (block $loop_end
    (loop $loop_start
      (set_local $i
        (i32.add (get_local $y) (get_local $x)))

      (br_if $loop_end (get_local $i))

      (br $loop_start))))

The semantics are identical in both cases—when instructions are nested, the runtime will flatten them behind the scenes in the correct order. When writing WebAssembly programs, you will likely find the nested instruction style more concise, and also more naturally corresponds to a traditional imperative style like in C.

Also known as “systems programming”, but I’m going to avoid that term. ↩

Sections

History of WebAssembly

Stack machines

Names and stores

Control flow

Anonymity and nested instructions