CS 242: Dynamic typing

For most of the quarter, type systems have been our central organizing principle for understanding the design of a programming language. We specifically looked at static typing, or how we can use formal methods to ensure that a program is type safe before it ever runs. This wasn’t just for basic types (“is this variable an integer?”), but also for complex invariants (“does this function adhere to a communication protocol?”).

However, as with many good things in life, static types come with a cost. Their analysis involves fundamentally conservative methods, optimized for soundness (if a program is well-typed, then it will never enter a stuck state) instead of completeness (if a program will never enter a stuck state, then it is well typed). For example, consider the following if expression:

fn foo() -> i32 {
  if false { "Hello" } else { 0 }
}

fn main() {
  println!("{}", foo() + 1);
}

error[E0308]: mismatched types
 --> test.rs:2:14
  |
1 | fn foo() -> i32 {
  |             --- expected `i32` because of return type
2 |   if false { "Hello" } else { 0 }
  |              ^^^^^^^ expected i32, found reference
  |
  = note: expected type `i32`
             found type `&'static str`

The program above will never enter a stuck state, since the first branch of the if expression is guaranteed to never execute. However, the Rust typechecker is conservative—it doesn’t try to make judgments on the actual value of the condition to the if expression, only checking that false is of boolean type. Hence, the Rust type system requires that the two branches of the if expression must have the same type.

Now, that may not seem like such a bad restriction to impose. If our if expression has different types on the branches, that’s more likely user error than an intentional design choice. But on a larger scale, the imposition of static typing can be quite severe in limiting the expressiveness of a language. Travel back to the start of the quarter with the untyped and typed lambda calculus. While the untyped lambda calculus was far from safe, it was (by our Church encoding) a Turing-complete language. However, the imposition of static typing (into the typed lambda calculus) actually eliminated this property, as we could no longer safely express recursive functions without further language extensions.

In that light, the question we will ask this week is: what programming patterns can we express in a dynamically-typed language that are difficult to express in today’s modern statically-typed languages?

Intro to Lua

To answer that question, first we need to understand the basics of a dynamically-typed language. In this course, we are going to look at a language called Lua. You might ask: why not Python or Javascript (or even Ruby)? In short, Lua is simple—it has far fewer corner cases than Javascript and certainly Python. As a language, it embodies the lambda calculus ideal of maximum expressiveness with minimum concepts. Let’s dive in.

The basics of Lua are pretty similar to most scripting languages. It has variables with lexical scoping:

local x = 1
local x = x + 1
print(x) -- 2

It has the usual primitive data types.

local n = 1
local s = "A string"
local b = false
local p = nil -- similar to None in Python, null in Javascript, () in Rust/OCaml

And the usual control flow.

if true == false then
  print('Contradiction!')
else
  print('Logically sound.')
end

local i = 0
while i < 10 do
  i = i + 1
end

for i = 0, 10 do
  print(i)
end

Functions are the normal un-curried kind.

local n = 2
local f = function(i)
  return i + n -- closures are fine
end

-- Equivalent to above
local function f(i)
  return i + n
end

print(f(2)) -- 4

In contrast to other languages, if you’re not careful, Lua has pretty relaxed semantics around auto-filling undefined variables with nil.

print(x) -- not an error, prints "nil"

local function f(a, b)
  print(a, b)
end

f(1) -- not an error, prints "1  nil"

local function f(optional_arg)
  -- common Lua idiom, or of two values returns the left one if it is "truthy"
  -- and the right one otherwise
  local arg = optional_arg or 1
  print(arg)
end

f() -- 1

The central organizing principle of Lua is the table, a magnificent data structure that touches every corner of Lua’s design. Tables are like dictionaries (or associated arrays) in that they map keys to values.

-- Table literal syntax
local t = {a = "b", ["c"] = d, [1] = 2, [false] = true}

-- Accessing elements of a table
print(t.a)      -- b
print(t["a"])   -- b
print(t.c)      -- d
print(t["c"])   -- d
print(t[1])     -- 2
print(t[false]) -- true

-- print each key, value pair in the table
for k, v in pairs(t) do
  print(k, v)
end

See Stateless Iterators for a more nuanced discussion of the various iterators in Lua. Tables also double as lists in Lua.

local t = {1, 2, 3} -- no key means implicitly integer indexed
table.insert(t, 4)
assert(#t == 4)     -- "#" is the length operator
assert(t[4] == 4)   -- alas, Lua is 1-indexed

Note: this list vs. table distinction can get hazy. See Array Size for a discussion of the edge cases in computing the size of a list.

Lastly, Lua has a little syntactic sugar to allow tables to emulate objects.

local t = {x = 1}

-- version 1, anonymous function
t.incr = function(self, n)
  self.x = self.x + n
end

-- version 2, named function (explicit self)
function t.incr(self, n)
  self.x = self.x + n
end

-- version 3, named method (implicit self)
function t:incr(n)
  self.x = self.x + n
end

t.incr(t, 3)
t:incr(3)
assert(t.x == 7)

Dynamic heterogeneous data structures

Now that we’re equipped with some Lua knowledge, let’s look at a few dynamically-typed programming idioms and see how they contrast with statically-typed languages. In dynamic languages, it’s common to have data structures like dictionaries that mix and match values of different types. This is particularly common when dealing with data types like JSON, or any data where it’s hard to predict ahead of time what the data format will be.

-- Lua
local t = {}
t[100] = 0
t[200] = "Hello world!"
-- No problem!

We can use two axes to describe the kinds of behavior occuring here and how it relates to static languages. The first is static vs. dynamic, or whether the size/shape of the data structure changes at runtime. For example, if we have:

// Rust
struct Person {
  name: String,
  age: i32
}

let will = Person { name: "Will".to_string(), age: 25 };
will.job = "Student"; // Not valid!

In a static language like Rust, a predefined struct is a static data structure, as we cannot add or remove keys at runtime. By contrast:

// Rust
let job_map: HashMap<String, String> = HashMap::new();
job_map.insert(will.name, "Student");

A map (or a set, vector, etc.) is a dynamic data structure, as you can add/remove key/value pairs at runtime. The other axis is homogeneous vs. heterogeneous, or whether a data structure can contain multiple values of different types. A struct can be heterogeneous, e.g. the Person above contains a string and a number, while the HashMap is homogeneous:

// Rust
let map = HashMap::new();
map.insert(100, 0);
map.insert(200, "Hello world"); // Error! Map value must be i32, not String

Depending on the language, there are multiple ways to implement the desired heterogeneity.

Subtype polymorphism: In a class-based language like Java, one can cast all the objects a superclass like Object:
```
// Java
Map<Integer, Object> map = new HashMap<Integer, Object>();
map.put(100, (Object) 0);
map.put(200, (Object) 200);
System.out.println((String) map.get(200));
```
Pros: little syntactic overhead, no restriction on sets of types that can be placed in the container.
Cons: requires explicit typecasts in and out of the container, runtime errors possible.
Any types: Static languages that support dynamic typing at runtime can use an Any type (or better yet, Obj.magic):
```
// Rust
let mut map: HashMap<i32, Box<Any>> = HashMap::new();
map.insert(100, Box::new(0));
map.insert(200, Box::new("Hello world".to_string()));

match map[&200].downcast_ref::<String>() {
  Some(ref s) => println!("{}", s),
  None => panic!("Type error")
};
```
Fairly similar to the Object approach in class-based languages, roughly same set of tradeoffs (although here, Rust uses an explicit error handling approach for cast errors instead of exceptions as in Java).
Trait polymorphism: In a trait/typeclass-based language like Rust or Haskell, one can upcast to a trait:
```
// Rust
trait ToString { fn to_string(&self) -> String; }
let map: HashMap<i32, Box<ToString>> = HashMap::new();
map.insert(100, Box::new(0));
map.insert(200, Box::new("Hello world"));
println!("{}", map[&200].to_string());
```
Pros: type-safe, no runtime errors, little syntactic overhead.
Cons: requires identifying the subset of functionality you need common to each type and aliasing to that trait (this is also true in subtyping systems that don’t permit runtime downcasts).

Note: this idea is related to the idea of “existential types” in programming language theory.
Sum types: Sum types (or unions, enums, variants, etc.) allow the definition of types that could be one of many things. Sum types take many forms, with three primary tradeoffs:
1. Named vs. anonymous: sum types in OCaml, Haskell, and Rust all have named branches. For example:
```
// Rust
enum MyValue {
  Int(i32),
  String(String)
}

let mut map: HashMap<i32, MyValue> = HashMap::new();
map.insert(100, MyValue::Int(0));
map.insert(200, MyValue::String("Hello world".into()));

match map[&200] {
  MyValue::Int(ref n) => println!("{}", n),
  MyValue::String(ref s) => println!("{}", s)
};
```
  Here, MyValue is the sum type name, and Int/String are the branch names. By contrast, languages like Ceylon have support for anonymous sum types:
```
// Ceylon
HashMap<Integer, Integer|String> map =
    HashMap { 100->0, 200->"Hello world" }
```
  Most languages that have fully-featured sum types tend to only have named sums, one benefit being that it allows branches with the same inner type to have different names, e.g. Foo(i32), Bar(i32).
2. Declared vs. inferred: some languages with sum types like Rust require an explicit declaration of the enum ahead of time, as in the previous example. Other languages, like Ceylon as well as OCaml’s polymorphic variants allow the sum types to be inferred by the compiler.
```
(* OCaml *)
let l = [`Int 0, `String "Hello world"] ;;
(* l inferred to have type [ `Int of int | `String of string ] *)
```
3. Matched vs. casted: some languages like OCaml and Haskell allow potentially incorrect casts from a sum type to one of its branches:
```
(* OCaml *)
type foo = X of int | Y of string ;;
let x = X 10 ;;
let X y = x ;; (* This raises an exception if incorrect *)
print_int y ;;
```
  By contrast, Rust require exhaustive matching:
```
// Rust
let MyValue::String(ref s) = map[&200]; // Invalid!
```
There’s a surprisingly large design space here, and the pros/cons relative to dynamic languages vary based on the particular choices a language makes. Casted, inferred, and anonymous sums arguably come quite close to equivalent dynamic code, although such a type system appears uncommon in practice.

Among these many cases, implementing a heterogeneous data structures in a static language generally gives rise to two kinds of conceptual impedance mismatch:

The programmer either cannot easily encode the desired heterogeneity in a particular static language, e.g. OCaml does not provide a simple solution for an Any type.
The programmer must decide too early (i.e. while initially describing the data structure) what kind of data types are allowed, e.g. in the enum case, or what functionality of the data types is allowed, e.g. in the traits case.

By contrast, using a dynamic language like Lua, you do not have to think about whether your data structures are polymorphic or not–you simply get it for free, no opinions necessary.

Ad-hoc interfaces

When writing polymorphic functions, i.e. functions that could apply to many different types, it’s common to specify a function over types that provide a specific set of functionality. For example:

-- Lua
local function table_print(t)
  for k, v in pairs(t) do
    print(k:tostring(), v:tostring())
  end
end

This function is implicitly defined over all tables (or technically iterables) that contain keys and values which can be converted to a string, using :tostring(). (This idea is also called “duck typing”, although I never found the term very informative.) In a static language, all variables (including function arguments) must have known types at compile time. But what should the type of t be in the above example? How we write down that type depends on the facilities of the type system.

Subtype polymorphism: if a base class contains all the functionality necessary for a generic function, then the concrete input type can be the base class. In Java, the Object class has a toString method, so this could be defined as:
```
// Java
static void printList(ArrayList<Object> l) {
  // l[0].toString() is valid
  ..
}

ArrayList<Object> l = new ArrayList<Object>();
l.add((Object) 0);
l.add((Object) "Hello world");
printList(l);
```
Pros: low syntactic overhead, flexible (no fixed set of types at implementation time).
Cons: any object must be an instance of a class which derives from the required base class, which gets awkward/unwieldy for ad-hoc combinations of different methods. Interface must be explicitly specified in a class definition/interface.

Trait polymorphism: this is basically the prime use case for traits, e.g. in Rust:

// Rust
trait ToString { fn to_string(&self) -> String; }

fn print_list(l: Vec<impl ToString>) {
  // l[0].to_string() is valid
  ..
}

let l1: Vec<String> = vec!["Hello world"].to_string();
print_list(l1);

// impl ToString approach doesn't allow mixed-type containers
let l2: Vec<Box<ToString>> = vec![Box::new("hi"), Box::new(0)];
print_list(l2); // Error! Box<ToString> isn't impl ToString (confusingly?)

fn print_list(l: Vec<Box<ToString>>) {
  // l[0].to_string() is valid
  ..
}

print_list(l2); // Now it works

Pros: zero syntactic/performance overhead in impl Trait case, logic is type-checked when implemented, not when used (e.g. a bug in print_list will be caught when the function is written).
Cons: different syntaxes/APIs for homogeneous vs. heterogeneous data structures (i.e. static vs. dynamic dispatch). Interface must be explicitly specified in a trait.

Compile-time metaprogramming: if a language supports staging like C++ with templates, then one can write:
```
// C++
template<typename T>
void print_list(std::vector<T> l) {
  // l[0].to_string() is POSSIBLY valid
}

class Foo {
public:
  std::string to_string() { .. }
}

std::vector<Foo> l;
l.push_back(Foo());
print_list<Foo>(l);

class Bar {
public:
  // No to_string() method
}

std::vector<Bar> l;
l.push_back(Bar());
print_list<Bar>(l); // Error! to_string() method is not defined on Bar
```
Pros: no explicit interface must be defined (entirely ad-hoc).
Cons: type-checking occurs when the API is used, not when it is defined (and compiler errors are forced to expose API implementation details). Does not allow polymorphic containers (T must be a concrete type like Foo, not like impl ToString). Also, template errors suck.

For static languages that support traits/typeclasses, the only overhead versus ad-hoc interfaces in dynamic languages is the verbosity of writing down the explicit interface and the type system complexity of parameterizing types by the interface. Not perfect, but not too bad. The bigger issue is that this mental model of the world naturally views the world compositionally, i.e. where any given type’s methods are built from many smaller traits, as opposed to hierarchically, where a type’s methods are derived from a class inheritance tree. If a programmer wants to write an ad-hoc interface in an object-oriented language, she must change her natural mental model of the program into the hierarchy required by the language, a clear conceptual impedance mismatch.

Reflection

A fairly unique feature of dynamic languages is that the entire language stack is usually available at runtime, e.g. it’s trivially easy to parse/translate/execute new code at runtime as well as to inspect/analyze language constructs like class members (the latter process is generally called “reflection”). This contrasts with compiled languages like C which produce standalone binaries that run outside the context of the compiler’s runtime.

In the Lua ecosystem, for example, reflection is used to serialize arbitrary objects, implement dynamic scoping, and even build your own debugger. Importantly, all of these tasks are accomplished by simple, intra-linguistic mechanisms, i.e. all you need is a Lua interpreter to use these libraries. No additional parsers, AST libraries, code generators, etc. need to be downloaded or run external to the language runtime. For example, to print out all the fields and methods of an object:

-- Lua
local object = {x = 1}
function object:incr()
  self.x = self.x + 1
end

for k, v in pairs(object) do
  if type(v) == 'function' then
    print(k, 'function')
  else
    print(k, 'field')
  end
end

In static languages with limited or zero reflection, these tasks are instead performed by metaprogramming mechanisms. In Java for example, tools like IDEs are frequently used to code generate class definitions, getter/setter methods, and more. Other static languages use programmable macro systems to define a code generation pre-compilation step, e.g. the C preprocessor for C/C++, macro_rules/procedural macros for Rust, PPX for OCaml, and Template Haskell for Haskell. For example, to create a default method for turning an object into a string for debugging in Rust:

// Rust
#[derive(Debug)]
struct Foo {
  x: i32,
  y: String
}

// ^ this compiles into something like:
impl Debug for Foo {
  fn debug(&self) -> String {
    format!("({}, {})", self.x, self.y)
  }
}

You can implement a custom derive pass yourself, although the API is pretty rough (see an example) compared to Lua’s, and it’s not accessible at runtime, only compile time. In particular, macro systems are of wildly varying quality and flexibility, with some static languages lacking any macro system or reflection entirely.

Here, the core impedance mismatch is that metaprogramming is fundamentally at odds with static typing. It’s simply difficult to express reflective language constructs in a verifiably safe manner. Type-safe metaprogramming has been the subject of decades of research in the academic programming languages community (e.g. MetaML, Scala LMS), and no clear system has emerged as the “right way” to do it. Compiled languages (and subsequently most static languages) seem empirically averse to exposing any compiler internals like tokens, syntax trees, types, etc. to the language runtime, although the tide is slowly changing in that direction.

Exceptions

Exceptions are a mechanism that allow non-local control flow, jumping from an raised exception to the closest catch on the call stack. Exceptions are not unique to dynamic languages—bog-standard languages like Java, niche functional languages like OCaml, and even systems languages like C++ support exceptions.

Exceptions are still worth talking about because not all static languages have them, and it’s important to understand how critical they are to the dynamic programming style. Generally speaking, much of the productivity gain of dynamic languages could boil down to “not forcing the programmer to think about an issue until it arises,” or in the Python community, “it’s better to ask forgiveness than permission.” For example, in the following snippet:

-- Lua
local f = assert(io.open('test.txt', 'r'))
local s = f:read("a")
print(1 + tonumber(s))

If I execute this program, it could fail with:

(line 1) test.txt: No such file or directory
(line 1) test.txt: Permission denied
(line 3) attempt to perform arithmetic on a nil value

In a dynamic language, we choose to deal with these exceptions if we want to. Languages like Python have try/except, while Lua has the simpler pcall:

local status, err = pcall(function()
  local f = assert(io.open('test.txt', 'r'))
  local s = f:read("a")
  print(1 + tonumber(s))
end)

if not status then
  print(err)
end

However, assuming I set everything up correctly, I don’t have to think about any of those edge cases until they occur. By contrast, a language like Rust chooses to encode many of these cases in the type system such that the programmer is forced to handle possible errors:

// Rust
let mut f = File::open("test.txt").expect("File not found");
let mut contents = String::new();
f.read_to_string(&mut contents).expect("Failed to read");
let n = contents.parse::<i32>().expect("Failed to become int");
println!("{}", 1 + n)

All of those expect calls are saying “if this operation failed, then abort the program, otherwise get the function’s output.” (See The Book for more on error handling in Rust, and how this could be made more concise.) This sort of pattern is common in languages like Haskell (but with monads) and OCaml (although the exceptions vs. monads debate is more fiery there).

There’s also the “error code” approach that C, Go, and sometimes Java adopt where calling a function returns both a possible pointer to the value and an error object, e.g.:

// Go
func CopyFile(dstName, srcName string) (written int64, err error) {
    src, err := os.Open(srcName)
    if err != nil {
        return
    }

    dst, err := os.Create(dstName)
    if err != nil {
        return
    }

    written, err = io.Copy(dst, src)
    dst.Close()
    src.Close()
    return
}

While you could theoretically ignore the errors and program as if they didn’t exist, this style of programming seems strictly worse compared to either exceptions or result types, since both approaches both require that an error be handled, not letting a user attempt to perform operations on an invalid handle to an object.

Exceptions are also a good example of how the dynamic feel of a language comes from not just making it possible to express certain idioms in your type system, but pervasively adopting these idioms across the standard libraries. Even if exceptions were added to Rust today, there’s no chance that the standard library would be changed from result types to exceptions in cases of failure, which means that a new programmer in Rust would never feel the benefits of exceptions barring how they use them in their own code. When your type system permits different sets of opinions and the canonical APIs all have different opinions than you, there’s simply no recourse.

Partial programs

During the process of developing a program, it’s often in an incomplete or inconsistent state. A programmer naturally develops a program one piece at a time, leaving the other pieces as stubs or comments. For example, if I wrote a program to divide each element of a list by a number, that might start like:

-- Lua
local function divide_table(l, n)
  if n == 0 then
    -- TODO
  else
    for k, v in ipairs(l) do
      l[k] = v / n
    end
  end
end

This works fine. I want to remember to hit the n = 0 edge case later, but don’t want to implement the error handling logic quite yet. However, if I wrote a similar program in a static language like Rust:

// Rust
fn divide_list(l: &mut Vec<i32>, n: i32) -> Vec<i32> {
  if n == 0 {
    // TODO
  } else {
    return l.into_iter().map(|x| x / n).collect::<Vec<_>>();
  }
}

Then this fails with a compile error:

error[E0308]: mismatched types
 --> test.rs:2:13
  |
2 |     if n == 0 {
  |  _____________^
3 | |     // do something?
4 | |   } else {
  | |___^ expected struct `std::vec::Vec`, found ()
  |
  = note: expected type `std::vec::Vec<i32>`
             found type `()`

The issue here is that the compiler requires that all values have a provably consistent type at compile time. Here, if the first if branch doesn’t also return a vector, then the type checker cannot say the function reliably returns a vector, a fair statement. However, this forces the programmer to add extra code to satisfy the type checker while iterating. In Rust, the simplest approach is to use the ! type:

// Rust
fn divide_list(l: Vec<i32>, n: i32) -> Vec<i32> {
  if n == 0 {
    // TODO
    return panic!("Divide by zero");
  } else {
    return l.into_iter().map(|x| x / n).collect::<Vec<_>>();
  }
}

This is similar to how exceptions are type-checked in a language like OCaml, i.e. the language has a type which is considered a valid replacement (or subtype) for any other type.

More broadly, it is generally true that any static analysis tool, like a type checker (as in all static languages) or a borrow checker (as in Rust), requires a complete knowledge of types/lifetimes/etc. after analysis of a program, either via explicit annotation or inference. In Rust, if I write:

// Rust
fn main() {
  let v: Vec<_> = Vec::new();
}

Then this fails with the error:

error[E0282]: type annotations needed
 --> test.rs:2:19
  |
2 |   let v: Vec<_> = Vec::new();
  |       -           ^^^^^^^^^^ cannot infer type for `T`
  |       |
  |       consider giving `v` a type
  |

Most type checkers would not, for example, decide T = Any and then insert runtime type checks any time a value from v was used. This is again an example of requiring a programmer to make decisions about the types of variables before it is necessary due to restrictions of the type system. Partial programs are not easily amenable to static analysis.