This lab will introduce you to the logistics of programming in OCaml. To get the starter code for this lab, go to the assignments repository and run git pull, then go into the lab1/starter directory.

Installation

First, you need to get OCaml installed on your computer.

1. Install the binaries

You will install OPAM, the package manager for OCaml.

  • OS X

    brew install opam
    
  • Ubuntu 16.04

    sudo apt-get install opam -y
    
  • Windows 10

    Get Bash on Ubuntu on Windows using these instructions then follow the Ubuntu guide.

  • Rice

    OCaml and OPAM are already installed with the correct versions on the Rice machines if you followed the Rice setup guide.

2. Install OCaml packages

We use OPAM to manage OCaml packages and install new OCaml versions.

opam init -y
opam switch 4.06.1
eval `opam config env`
opam install core menhir utop merlin ocp-indent user-setup -y

NOTE: If opam switch 4.06.1 doesn’t work and you get a message asking you to run opam switch create 4.06.1 instead, do that.

This step will take a while!

3. Configure your editor

The most crucial component of the OCaml development toolchain is merlin, a tool that drives many IDE features like getting the types of variables, jumping to symbols, showing syntax errors, and so on. It will make your life much easier, so you really should get it installed.

4. Setup persistent OCaml env

Add to your .bashrc (or equivalent)

eval `opam config env`
  • Vim and Emacs

    Configuring Merlin for Vim and Emacs is simple:

    opam user-setup install
    
  • Anything else

    Any other editor (Atom, VSCode, Sublime) requires more work. See the Merlin wiki page for installation instructions.

Running code

There are two main ways of running OCaml. It has a basic REPL (ocaml on the command line) and a more fully-featured REPL (utop on the command line). For example, try this:

$ utop
──────────────────────────┬─────────────────────────────────────────────────────────────┬───────────────────────────
                          │ Welcome to utop version 2.2.0 (using OCaml version 4.06.1)! │
                          └─────────────────────────────────────────────────────────────┘

...

─( 22:37:04 )─< command 0 >──────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # let x = 1 + 1 ;;
val x : int = 2

Note that at the REPL, you have to add the double semicolon ;; after each expression for OCaml to evaluate it.

OCaml also has a compiler. If you create a file test.ml that contains:

let () = Printf.printf "test: %d\n" 0

And then run ocamlbuild test.native, this will create an executable test.native that runs the program. Generally we will provide you a Makefile that wraps ocamlbuild to ensure you have the appropriate dependencies.

Expressions

OCaml is basically an extended version of the typed lambda calculus. (I will reiterate this point until the end of time.) For example:

Is almost exactly the same as the equivalent OCaml expression:

(fun (x : int) -> x + 1) 2

OCaml has functions (just fun instead of ) and integers. It also has a few useful built-in types like string.

let s1 : string = "Hello " in       (* string literals *)
let s2 : string = s1 ^ " world!" in (* string concatenation *)
Printf.printf "%s" s2               (* string formatting *)

As a reminder, OCaml is an expression-oriented language. In a traditional programming language, variable bindings don’t “have a value.” For example, in C, if I write:

int x = 1;
int y = 2;
return x + y;

Then these would be imperative “statements” that have side effects, i.e. they affect some state not explicit in the program. By contrast, lets in OCaml are expressions:

let x : int = 1 in
let y : int = 2 in
x + y

The above is an expression that evaluates to the number 3. While OCaml is still (mostly) immutable, i.e. you can’t change the value of a variable like in imperative programming, you can emulate mutable variables by repeated bindings:

let x : int = 1 in
let x : int = x + 1 in
x

Sometimes it’s still useful to have side effects, most notably printing strings to the console. Functions with side effects, like Printf.printf, return the unit type (). We can use the semicolon operator to interleave prints in our code:

let x = 1 in
Printf.printf "%d" x;
let y = 2 in
x + y

Note: you will look at the semicolon and think “aha! a statement!” But in fact, the semicolon is an infix operator that takes two expressions, executes the left one, discards its result, and then returns the right one.

To create functions, we have a few options. Here are several ways to implement the same add function:

(* Most explicit, anonymous function syntax *)
let add : (int -> int -> int) = fun (m : int) : (int -> int) -> fun (n : int) : int -> m + n in

(* Example usage. Note that like in the lambda calculus, we can partially
 * apply our function to get a new one that remembers the original argument. *)
let add1 : int -> int = add 1 in
Printf.printf "%d" (add1 2);

(* Elide the argument types *)
let add : (int -> int -> int) = fun m -> fun n -> m + n in

(* Functions with multiple arguments are automatically converted to multiple
 * functions. *)
let add = fun m n -> m + n in

(* Most explicit, named function syntax. Note that we specify a return type *)
let add (m : int) (n : int) : int = m + n in

(* Elide the argument and return type *)
let add m n = m + n in

()

Lastly, note that to create a recursive function, you have to use the special let rec syntax:

let rec sum n =
  if n = 0 then 0
  else n + (sum (n - 1))

Type inference and debugging

Note that in the above function definitions, we were allowed to leave off type annotations in variousplaces. This is because OCaml has a feature called type inference, which can guess which types you intended based on the contents of the expression, e.g. because 1 is a number and x = 1, then x should be a number.

Rely on type inference as little as possible. Seriously, try to remember to annotate everything. If you have a logic bug, or even a syntax error, sometimes OCaml will try to infer a ridiculous type for your program. You will sit there staring at the error until giving up. For example, here’s a correct, fully typed factorial function written into the interpreter:

# let rec fact (n : int) : int = 1 + fact (n - 1) ;;
val fact : int -> int = <fun>

If we omit all the types, the function is still correct.

# let rec fact n = 1 + fact (n - 1) ;;
val fact : int -> int = <fun>

However, let’s say you made a typo, like typing rect instead of rec.

# let rect fact n = 1 + fact (n - 1) ;;
val rect : (int -> int) -> int -> int = <fun>

Uh-oh! The OCaml syntax thinks that rect is the function name, and fact is the first argument. What if we forgot to put in parentheses?

# let rect fact n = 1 + fact n - 1 ;;
val rect : ('a -> int) -> 'a -> int = <fun>

This is actually parenthesized as 1 + (fact n) - 1. So it can’t guess that n is an integer because n is actually never used, giving you this confusing type signature where n is any type 'a (to be explained in later lectures).

In summary, be a good programmer, always annotate your functions and lets. Your future self will thank you.

Exercises, part 1

Now it’s time to apply the knowledge in the above section. In the lab1 directory, we have provided you a file exercise1.ml that contains starter code for the following exercises. To run it, type make and then ./exercise1.native.

First, implement a function gcd m n that returns the greatest common denominator between two numbers. Translate it from the corresponding C implementation:

int gcd(int a, int b) {
  while (b != 0) {
    int t = b;
    b = a % b;
    a = t;
  }
  return a;
}

A key part of the functional programming paradigm is understanding how to map imperative code (like for loops) to functional concepts (like recursion). Don’t try to fit a square peg into a round hole.

Second, implement a function fizz_buzz n that iterates from 0 to n, and for each number, prints “fizz” if it’s divisible by 3, “buzz” if by 5, and “fizzbuzz” if by both 3 and 5. Again, the imperative C code:

void fizz_buzz(int n) {
  for (int i = 0; i < n; ++i) {
    if (i % 3 == 0 && i % 5 == 0) {
      printf("fizzbuzz\n");
    } else if (i % 3 == 0) {
      printf("fizz\n");
    } else if (i % 5 == 0) {
      printf("buzz\n");
    }
  }
}

Third, implement a function read_password password that continually reads lines from stdin until the provided password is read.

Hint: you will want to use the provided function read_line : unit -> string.

Challenge: implement a function substring_match pattern source that returns the first index in the string source where the substring pattern occurs (or None otherwise). You will need the following functions:

val String.length : string -> int
val String.slice : string -> int -> int -> string

Data structures

Now that we’ve seen how to use the basic data types (ints, strings, functions), we’re going to move on to composite data structures. We’ll talk about the theory behind these features in class on Wednesday, but for now we’ll just focus on the practical bits of how to use them in OCaml.

Tuples and records

The basic building block of composite data types is a tuple, or fixed-size collection of N objects. For example, if we’re emulating 3-dimensional integer points:

type point = int * int * int

let p : point = (1, 2, 3) in
let (x, y, z) = p in
Printf.printf "%d %d %d" x y z

This example shows a few key features: first, you can alias types with the type keyword. Here, int * int * int means a tuple with three fields, each an integer. Then we can make a tuple with the parentheses syntax (1, 2, 3). Finally, tuples can be accessed using a pattern-match syntax let (x, y, z) = p. Pervasive pattern matching is one of the best parts of functional programming, allowing you to concisely and flexibly extract data from data structures.

Instead of using a tuple, we can give each component a name with a record (i.e. a struct):

type point = { x : int; y : int; z : int }
let p : point = { x = 1; y = 2; z = 3; } in (* record *)
Printf.printf "%d %d %d" p.x p.y p.z;
let {x; y; z} = p in
Printf.printf "%d %d %d" x y z;

We can use OCaml’s match construct to do take different actions based on the contents of the point structures.

let point_pair : point * point = ({x=0; y=0; z=0}, {x=1; y=2; z=3}) in
match point_pair with
| ({x = 0; y; _}, p2) -> y + p2.x           (* pattern match nested structures, ignore fields *)
| (p1, p2) when p1.x < p2.x -> p2.x - p1.x  (* use guards for complex conditions *)
| _ -> assert false                         (* wildcard for catch-all *)

Variants

Like in most Turing languages, OCaml has a notion of a variant, or enum. This is a datatype that can be one of many options. For example:

type weather = Sunny | Rainy | Cloudy

let todays_weather : weather = Sunny in
match todays_weather with
| Sunny -> Printf.printf "It's sunny! :D"
| Rainy -> Printf.printf "It's rainy... :<"
| Cloud -> Printf.printf "It's cloudy. :|"

OCaml’s variants have two key features that distinguish them from standard enums: first, each branch can have associated values, and second, the enum can be defined recursively. For example, we can define the set of messages a server can receive:

type Message = Handshake | Broadcast of string | Login of id * password

let rec server msg =
  match msg with
  | Handshake -> ...
  | Broadcast sentence -> ...
  | Login (id, pwd) -> ...

And we can recursively define a binary tree of integers:

type tree = Leaf | Node of tree * int * tree

(*   2
 *  / \
 * 3   1 *)
let t = Node(Node(Leaf, 3, Leaf), 2, Node(Leaf, 1, Leaf)) in
match t with
| Node (l, n, r) -> Printf.printf "%d" n
| Leaf -> Printf.printf "Leaf"

Lists

In OCaml, the pervasive sequential data structure is a list, which is really a linked list (as opposed to an array which easily permits random access). This is because linked lists have a natural inductive interpretation:

type int_list = Null | Node of int * int_list

let empty_list : int_list = Null                     (* [] *)
let one_two_list : int_list = Node(1, Node(2, Null)) (* [1, 2] *)

In the OCaml standard library, an equivalent definition exists, except Null is written [] and Node is written ::. For example:

let empty_list : int list = []
let one_two_list : int list = 1 :: 2 :: []
let one_two_list : int list = [1; 2] (* shorthand for above *)

(Note the space between int and list here, since list is technically a polymorphic data type. To be explained next week.)

Since a list is “just” a variant type, you can use a match the same way as with Message or Weather. For example:

let l : int list = [1; 2; 3] in
match l with
| [] -> Printf.printf "empty list"
| [x] -> Printf.printf "list with one element %d" x
| x :: xs -> Printf.printf "list with head %d" x

Here, the last case will execute where x = 1 and xs = [2; 3]. With this tool in hand, we can now start to write interesting functions on lists. For example, a function that removes odd numbers out of a list:

let rec remove_odd (l : int list) : int list =
  match l with
  | [] -> []
  | x :: xs ->
    if x mod 2 = 0 then x :: (remove_odd xs)
    else remove_odd xs

assert (remove_odd [1; 2; 3] = [2])

The key idea is to use recursion instead of a for/while loop as you might do in an imperative language. However, it’s common in functional programming to have higher-order functions that allow you to avoid doing recursion:

let remove_odd (l : int list) : int list =
  List.filter l ~f:(fun (x : int) : bool -> x mod 2 = 0)

(* Or even more simply... *)
let remove_odd : int list -> int list = List.filter ~f:(fun x -> x mod 2 = 0)

assert (remove_odd [1; 2; 3] = [2])

Exercises, part 2

You’re part of the Functional Weather Service, tasked with analyzing the forecasts for the next few days. A forecast is defined as:

type weather_type = Sunny | Cloudy | Rainy of float
type forecast = {
  weather: weather_type;
  temperature: float;
  humidity: float;
}

(* For example... *)
let today : forecast = {
  weather = Rainy 0.6; (* 60% chance of rain *)
  temperature = 70;
  humidity = 0.1;
}

Your goal is to implement the following functions:

(* Return just the temperature for each day *)
val just_temp : forecast list -> float list

(* Get the average temperature across all days *)
val average_temp : forecast list -> float

(* Return a list of all days where the forecast is for rain above 70% chance
   and humidity above 50% *)
val humid_rainy_days : forecast list -> forecast list

(* Return a frequency count (histogram) of the number of days with each
 * weather type, e.g. [Sunny, Sunny, Cloudy] -> [(Sunny, 2), (Cloudy, 1)] *)
val weather_histogram : forecast list -> (weather_type * int) list

Some notes:

  • Implement just_temp without using any higher-order functions.
  • Implement average_temp without using recursion.
  • You will find the following functions useful throughout the problems:
    • List.length l: returns the length of a list as an integer.
    • List.map l ~f: runs the function f over each element of the list l, returning a new list. For example:
      (List.map [1; 2] ~f:(fun n -> n + 1)) = [2; 3]
      
    • List.filter l ~f: returns a new list with only elements in l that match the predicate f. (See the remove_odd example above.)
    • List.fold_left l ~init ~f: applies the function f to each element of the list while accumulating some state. For example:
      (List.fold_left [1; 2; 3] ~init:0 ~f:(fun (n : int) (accum : int) : int -> accum + n)) = 6
      

Programming patterns

I want to briefly highlight a few useful programming patterns/idioms that show up in OCaml (and functional programming more broadly).

The first is partial application. Like the lambda calculus, all functions in OCaml have one input and one output. A function can choose to have multiple inputs using a tuple, but it isn’t required:

let div1 m n = m / n in
let div2 (m, n) = m / n in
(div1 3 2) + (div2 (6, 2))

When functions take tuple as inputs, they require all the arguments to be present at the same time in the function call. By contrast, functions that use the non-tuple style don’t have this requirement. You can give a function some of its arguments, but not all, and get back a function that remembers the inputs while waiting for more.

let add m n = m + n in
let add2 = add 2 in
assert(add2 3 = 5)

This style of programming with functions is called partial application, or “currying” (after the mathematician Haskell Curry). It’s particularly useful with higher-order functions, or functions that take other functions as input (e.g. map). You’ll see more of this on the third assignment.

The second pattern is monads. A common issue when dealing with variants is nested matches. For example, let’s say we have a safe division that uses an int option to represent a divion that succeeds (Some n) or fails (None):

let div (m : int) (n : int) : int option =
  if n = 0 then None
  else Some (m / n)
in

match div 100 3 with
| Some n ->
  (match div n 5 with
   | Some n -> div n 2
   | None -> None)
| None -> None

For long chains of error-prone operations, this can get incredibly verbose. Instead, we can use a particular kind of programming idiom to simplify this logic.

let bind (opt : int option) (f : int -> int option) : int option =
  match opt with
  | None -> None
  | Some x -> f x
in
bind
  (div 100 3)
  (fun n -> bind
    (div n 5)
    (fun n -> (div n 2)))

This “bind” function essentially says: take as input an option, and then a function that we run on the option if it is a Some, returning a new option. (For theoretical reasons beyond the scope of this tutorial, this is part of a “monadic” interface, but for now just roll with it as a programming pattern.) This is still a little verbose, but if we change bind to an infix operator >>=:

let bind (opt : int option) (f : int -> int option) : int option =
  match opt with
  | None -> None
  | Some x -> f x
in

(* Parentheses indicate a custom infix operator *)
let (>>=) = bind in

div 100 3
>>= fun n ->
div n 5
>>= fun n ->
div n 2

When used as an infix operator, the bind function allowus to linearize our nested matches which produces more readable code.