An executable monadic specification for Spark aggregation

Yu-Fang Chen, Chih-Duo Hong, Ondřej Lengál, Shin-Cheng Mu, Nishant Sinha, and Bow-Yaw Wang. Submitted.
[Code|Supplementary Proofs]

Spark is a new promising platform for scalable data-parallel computation that provides several high-level APIs to perform efficient data aggregation. Distributed computation is inherently non-deterministic, while programmers generally wish that their aggregation yield the same result regardless of how the data is distributed. We present an executable, monadic, formal specification for Spark aggregate combinators in Haskell. By algebraic, equational reasoning of monadic programs, requirements for deterministic outcomes from distributed Spark aggregation are precisely characterized. We report several case studies to analyze deterministic outcomes and correctness of Spark programs, and also illustrate how our executable specification helps developing a distributed Spark vertex coloring program.

Queueing and glueing for optimal partitioning

Shin-Cheng Mu, Yu-Hsi Chiang, and Yu-Han Lyu. In the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016), Sep. 2016. Accepted.
[Paper | Code]

The queueing-glueing algorithm is the nickname we give to an algorithmic pattern that provides amortised linear time solutions to a number of optimal list partition problems that have a peculiar property: at various moments we know that two of three candidate solutions could be optimal. The algorithm works by keeping a queue of lists, glueing them from one end, while chopping from the other end, hence the name. We give a formal derivation of the algorithm, and demonstrate it with several non-trivial examples.

Code accompanying this paper is available on GitHub:

Formal derivation of greedy algorithms from relational specifications: a tutorial

Yu-Hsi Chiang, Shin-Cheng Mu. Journal of Logical and Algebraic Methods in Programming, in press.
[Paper(doi:10.1016/j.jlamp.2015.12.003) | Code]

Many programming tasks can be specified as optimisation problems in which a relation is used to generate all possible solutions, from which we wish to choose an optimal one. A relational operator “shrink”, developed by José N. Oliveira, is particularly suitable for constructing greedy algorithms from such specifications. Meanwhile, it has become standard in many sub-fields in programming language that proofs must be machine-verified. This tutorial leads the readers through the development of algebraic derivations of three greedy algorithms, one fold-based and two unfold-based, using AoPA, a library designed for machine-verified relational program calculation.

Modular reifiable matching: a list-of-functors approach to two-level types

Bruno C. d. S. Oliveira, Shin-Cheng Mu, and Shu-Hung You. In the 8th ACM SIGPLAN Symposium on Haskell (Haskell 2015), pages 82-93. Sep. 2015.
[Paper (doi: 10.1145/2804302.2804315)| Code]

This paper presents Modular Reifiable Matching (MRM): a new approach to two level types using a fixpoint of list-of-functors representation. MRM allows the modular definition of datatypes and functions by pattern matching, using a style similar to the widely popular Datatypes a la Carte (DTC) approach. However, unlike DTC, MRM uses a fixpoint of list-of-functors approach to two-level types. This approach has advantages that help with various aspects of extensibility, modularity and reuse. Firstly, modular pattern matching definitions are collected using a list of matches that is fully reifiable. This allows for extensible pattern matching definitions to be easily reused/inherited, and particular matches to be overridden. Such flexibility is used, among other things, to implement extensible generic traversals. Secondly, the subtyping relation between lists of functors is quite simple, does not require backtracking, and is easy to model in languages like Haskell. MRM is implemented as a Haskell library, and its use and applicability are illustrated through various examples in the paper.

Code accompanying this paper is available on GitHub:

Calculating a linear-time solution to the densest segment problem

Sharon Curtis and Shin-Cheng Mu. Journal of Functional Programming, Vol. 25, 2015.
[Paper (doi: 10.1017/S095679681500026X)| Supplementary Proofs | Code]

The problem of finding a densest segment of a list is similar to the well-known maximum segment sum problem, but its solution is surprisingly challenging. We give a general specification of such problems, and formally develop a linear-time online solution, using a sliding window style algorithm. The development highlights some elegant properties of densities, involving partitions that are decreasing and all right-skew.

The Code

The accompanying Haskell programs and Agda proofs are available on GitHub:
The Haskell program consists of the following files:

  • FDSList.hs: a main program that, for expository purpose, represents sequences by lists.
  • FDSSeq.hs: a linear-time implementation of the algorithm, using refined data structures as described in the paper.
  • ProbMDS.hs: problem specification for the MDS problem. To be imported by FDSList.hs or FDSSeq.hs.
  • ProbMMS.hs: problem specification for the MMS problem. To be imported by FDSList.hs or FDSSeq.hs.
  • MDSTest.hs: QuickCheck tests for the MDS problem.
  • MMSTest.hs: QuickCheck tests for the MMS problem.

Programming from Galois connections

Shin-Cheng Mu and José Nuno Oliveira. In the Journal of Logic and Algebraic Programming, Vol 81(6), pages 680–704, August 2012.

Problem statements often resort to superlatives such as in eg. “… the smallest such number”, “… the best approximation”, “… the longest such list” which lead to specifications made of two parts: one defining a broad class of solutions (the easy part) and the other requesting the optimal such solution (the hard part).

This article introduces a binary relational combinator which mirrors this linguistic structure and exploits its potential for calculating programs by optimization. This applies in particular to specifications written in the form of Galois connections, in which one of the adjoints delivers the optimal solution.

The framework encompasses re-factoring of results previously developed by Bird and de Moor for greedy and dynamic programming, in a way which makes them less technically involved and therefore easier to understand and play with.

This is an extended version of our conference paper published in RAMiCS 2011.

Proving the Church-Rosser Theorem Using a Locally Nameless Representation

Around 2 years ago, for an earlier project of mine (which has not seen its light yet!) in which I had to build a language with variables and prove its properties, I surveyed a number of ways to handle binders. For some background, people have noticed that, when proving properties about a language with bound variables, one often ends up spending most of the effort proving “uninteresting” things about names, substitution, weakening, etc. For formal theorem proving to be practical, the effort has to be reduced. A number of approaches were proposed. Among them, the “locally nameless” style appeared to be interesting and promising.

With the only reference I used, Engineering Formal Metatheory by Aydemir et al., which outlines the principle ideas of the approach, I imagined how it works and tried to implement my own version in Agda. My first few implementations, however, all ended up in a mess. It appeared that there was an endless number of properties to prove. Besides the complexity of the language I was implementing, there must be something I got wrong about the locally nameless representation. Realising that I could not finish the project this way, I eventually decided to learn from the basics.

I started with the tutorial by Chargueraud, with complete code in Coq. I would then follow his footprints using Agda. The task is a classical one: define untyped λ-calculus and its reduction rules, and prove the Church-Rosser theorem.


From an abstract level, there is nothing too surprising. We define a syntax for untyped λ-calculus that distinguishes between free and bound variables:

data Term : Set where
  bv : (i : BName) → Term
  fv : (x : FName) → Term
  ƛ : (e : Term) → Term
  _·_ : (e₁ : Term) → (e₂ : Term) → Term

where BName = ℕ represents bound variables by de Bruin indexes, while FName is the type of free variables. The latter can be any type that supports equality check and a method that generates a new variable not in a given set (in fact, a List) of variables. If one takes FName = String, the expression λ x → x y, where y occurs free, is represented by ƛ (bv 0 · fv "y"). For ease of implementation, one may takeFName = ℕ as well.

Not all terms you can build are valid. For example, ƛ (bv 1 · fv "y") is not a valid term since there is only one ƛ binder. How to distinguish the valid terms from invalid ones? I would (and did) switch to a dependent datatype Term n, indexed by the number of enclosing binders, and let BName = Fix n. The index is passed top-down and is incremented each time we encounter a ƛ. Closed terms are then represented by the type Term 0.

The representation above, if works, has the advantage that a term that can be build at all must be valid. Choosing such a representation was perhaps the first thing I did wrong, however. Chargueraud mentioned a similar predicate that also passes the “level” information top-down, and claimed that the predicate, on which we will have to perform induction on to prove property about terms, does not fit the usual pattern of induction. This was probably why I had so much trouble proving properties about terms.

The way to go, instead, is to use a predicate that assembles the information bottom up. The predicate LC (“locally-closed” — a term is valid if it is locally closed) is defined by:

data LC : Term → Set where
  fv : ∀ x → LC (fv x)
  ƛ : (L : FNames) → ∀ {e} → 
       (fe : ∀ {x} → (x∉L : x ∉ L) → LC ([ 0 ↦ fv x ]  e)) → LC (ƛ e)
  _·_ : ∀ {e₁ e₂} → LC e₁ → LC e₂ → LC (e₁ · e₂)

A free variable alone is a valid term. A application f · e is valid if both f and e are. And an abstraction ƛ e is valid if e becomes a valid term after we substitute any free variable x for the first (0-th) bound variable. There can be an additional constraint on x, that it is not in L, a finite set of “protected” variables — such co-finite quantification is one of the features of the locally nameless style.

The “open” operator [ n ↦ t ] e substitutes the term t for the n-th bound variable in e. It is defined by

[_↦_] : ℕ → Term → Term → Term
[ n ↦ t ] (bv i) with n ≟ i
...        | yes _ = t
...        | no _ = bv i
[ n ↦ t ] (fv y) = fv y
[ n ↦ t ] (ƛ e) = ƛ ([ suc n ↦ t ] e)
[ n ↦ t ] (e₁ · e₂) = [ n ↦ t ] e₁ · [ n ↦ t ] e₂

note how n is incremented each time we go into a ƛ. A dual operator,

[_↤_] : ℕ → FName → Term → Term

instantiates the n-th bound variable to a term.

β Reduction and Church-Rosser Theorem

Small-step β reduction can be defined by:

data _β→_ : Term → Term → Set where
  β-red : ∀ {t₁ t₂} → Body t₁ → LC t₂ 
          → ((ƛ t₁) · t₂) β→ [ 0 ↦ t₂ ] t₁
  β-app₁ : ∀ {t₁ t₁' t₂} → LC t₂ 
           → t₁ β→ t₁'
           → (t₁ · t₂) β→ (t₁' · t₂)
  β-app₂ : ∀ {t₁ t₂ t₂'} → LC t₁ 
           → t₂ β→ t₂'
           → (t₁ · t₂) β→ (t₁ · t₂')
  β-ƛ : ∀ L {t₁ t₁'}
           → (∀ x → x ∉ L → ([ 0 ↦ fv x ] t₁) β→ ([ 0 ↦  fv x ] t₁'))
           → ƛ t₁ β→ ƛ t₁'

where β-red reduces a redux, β-app₁ and β-app₂ allows reduction respectively on the right and left hand sides of an application, and β-ƛ goes into a ƛ abstraction — again we use co-finite quantification.

Given _β→_ we can define its reflexive, transitive closure _β→*_, and the reflexive, transitive, symmetric closure _β≣_. The aim is to prove that _β→*_ is confluent:

β*-confluent : 
  ∀ {m s t} → (m β→* s) → (m β→* t)
                  → ∃ (λ u → (s β→* u) × (t β→* u))

which leads to the Church-Rosser property:

β-Church-Russer : ∀ {t₁ t₂} → (t₁ β≣ t₂)
                    → ∃ (λ t → (t₁ β→* t) × (t₂ β→* t))

At an abstract level, the proof follows the classical route: it turns out that it is easier to prove the confluence of a “parallel reduction” relation _⇉_ which allows β reduction to happen in several places of a term in one step. We then prove that _β→*_ is equivalent to _⇉*_, thereby proving the confluence of _β→*_ as well. All these can be carried out relatively nice and clean.

The gory details, however, hides in proving the infrastructutal properties supporting the abstract view of the proofs.


Confluence, Church-Rosser… these are the interesting stuffs we want to prove. However, we often end up spending most of the time proving those infrastructure properties we have to prove — which is why there have been so much recent research hoping to find better representations that simplify them. The locally nameless style is supposed to be such a representation. (Another orthogonal topic is to seek generic representations such that the proofs can be done once for all languages.)

In my code, most of these properties are piled in the file Infrastructure.agda. They range from stuffs you might expect to have:

open-term : ∀ k t {e} → LC e → e ≡ [ k ↦ t ] e
close-var-open-aux : ∀ k x e → LC e → e ≡ [ k ↦ fv x ] ([ k ↤ x ] e)

to stuffs not that obvious:

open-var-inj : ∀ k x t u → x ∉ fvars t → x ∉ fvars u
               → [ k ↦ fv x ] t ≡ [ k ↦ fv x ] u → t ≡ u
open-term-aux : ∀ j t i u e → ¬ (i ≡ j)
                → [ j ↦ t ] e ≡ [ i ↦ u ] ([ j ↦ t ] e) 
                → e ≡ [ i ↦ u ] e

The lemma open-var-inj is one of the early lemmas that appeared in Chargueraud’s tutorial, which might give one the impression that it is an easy first lemma to prove. On the contrary, it is among the tedious ones — I needed a 40-line proof (most of the cases were simply eliminated by contraction, though).

It takes experience and intuition to know what lemmas are needed. Without promise that it will work, I would think something must have gone wrong when I found myself having to prove weird looking lemmas like:

close-var-rec-open : 
  ∀ x y z t i j 
  → ¬(i ≡ j) → ¬(y ≡ x) → y ∉ fvars t
  → [ i ↦ fv y ] ([ j ↦ fv z ] ([ j ↤ x ] t))
    ≡ [ j ↦ fv z ] ([ j ↤ x ] ([ i ↦ fv y ] t))

which is not easy to prove either.

The Bottom Line?

So, is the locally nameless representation what it claims to be — a way to represent binders that simplifies the infrastructural proofs and is easier to scale up? When I was struggling with some of the proofs in Infrastructure.agda I did wonder whether the claim is true only for Coq, with cleverly designed proof tactics, but not for Agda, where everything is done by hand (so far). Once the infrastructural proofs are done, however, the rest was carried out very pleasantly.

To make a fair comparison, I should re-implement everything again using de Bruin notation. That has to wait till some other time, though. (Any one want to give it a try?)

It could be the case that, while some proofs are easily dismissed in Coq using tactics, in Agda the programmer should develop some more abstractions. I did feel myself repeating some proof patterns, and found one or two lemmas that do not present in Chargueraud’s tutorial which, if used in Agda, simplifies the proofs a bit. There could be more, but at this moment I am perhaps too involved in the details to see the patterns from a higher viewpoint.

The exercise does pay off, though. Now I feel I am much more familiar with this style, and am perhaps more prepared to use it in my own project.


A zip file containing all the code.


  1. Brian Aydemir, Arthur Chargueraud, Benjamin Pierce, Randy Pollack, and Stephanie Weirich. Engineering Formal Metatheory. POPL ’08.
  2. Arthur Chargueraud. The Locally Nameless Representation. Journal of Automated Reasoning, May 2011

Constructing list homomorphisms from proofs

Yun-Yan Chi and Shin-Cheng Mu. In the 9th Asian Symposium on Programming Languages and Systems (APLAS 2011), LNCS 7078, pages 74-88.

The well-known third list homomorphism theorem states that if a function h is both an instance of foldr and foldl, it is a list homomorphism. Plenty of previous works devoted to constructing list homomorphisms, however, overlook the fact that proving h is both a foldr and a foldl is often the hardest part which, once done, already provides a useful hint about what the resulting list homomorphism could be. In this paper we propose a new approach: to construct a possible candidate of the associative operator and, at the same time, to transform a proof that h is both a foldr and a foldl to a proof that h is a list homomorphism. The effort constructing the proof is thus not wasted, and the resulting program is guaranteed to be correct.

Generalising and dualising the third list-homomorphism theorem

Shin-Cheng Mu and Akimasa Morihata. In the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP 2011), pages 385-391.

The third list-homomorphism theorem says that a function is a list homomorphism if it can be described as an instance of both a foldr and a foldl. We prove a dual theorem for unfolds and generalise both theorems to trees: if a function generating a list can be described both as an unfoldr and an unfoldl, the list can be generated from the middle, and a function that processes or builds a tree both upwards and downwards may independently process/build a subtree and its one-hole context. The point-free, relational formalism helps to reveal the beautiful symmetry hidden in the theorem.

Calculating Programs from Galois Connections

Galois connections are ubiquitous in mathematics and computing science. One is often amazed that, once two functions are identified as a Galois connection, a long list of nice and often useful properties follow from one concise, elegant defining equation. But how does one construct a program from a specification given as a Galois connection? This is the topic of a recent work of José Nuno Oliveira and I, and this article is an advertisement.

Galois Connections as Specifications

In program construction one often encounters program specification of the form “… the smallest such number”, “the longest prefix of the input list satisfying …”, etc. A typical example is whole number division: given a natural number x and a positive integer y, x / y is the largest natural number that, when multiplied by y, is at most x. For another example, the Haskell function takeWhile p returns the longest prefix of the input list such that all elements satisfy predicate p.

Such specifications can be seen as consisting of two parts. The easy part specifies a collection of solution candidates: numbers that are at most x after multiplication with y, or all prefixes of the input list. The hard part, on the other hand, picks one optimal solution, such as the largest, the longest, etc., among the collection.

Our goal is to calculate programs for such specifications. But how best should the specification be given in the first place? Take division for example, one might start from a specification that literally translates our description above into mathematics:

    x / y = ⋁{ z | z * y ≤ x } 

As we know, however, suprema is in general not easy to handle. One could also explicitly name the remainder:

    z = x / y  ≡  (∃ r : 0 ≤ r < y : x = z * y + r)

at the cost of existentially quantifying over the remainder.

A third option looks surprising simpler: given x and y, the value x / y is such that for all z,

    z * y ≤ x  ≡  z ≤ x / y(1)

Why is this sufficient as a definition of x / y? Firstly, by substituting x / y for z, the right hand side of reduces to true, and we obtain on the left hand side (x / y) * y ≤ x. This tell that x / y is a candidate --- it satisfies the easy part of the specification. Secondly, read the definition from left to right: z * y ≤ x ⇒ z ≤ x / y. It says that x / y is the largest among all the numbers satisfying the easy part.

Equations of the form are called Galois connections. Given preorders and , Functions f and g form a Galois connection if for all x and z we have

    f z ⊑ x  ≡  z ≤ g x(2)

The function f is called the lower adjoint and g the upper adjoint.

The definition of division above is a Galois connection where f = (* y) and g = (/ y). For another example, takeWhile p can be specified as an upper adjoint:

    map p? zs ⊑ xs  ≡  zs ⊑ takeWhile p xs(3)

where is the prefix ordering: ys ⊑ xs if ys is a prefix of xs, and map p? is a partial function: map p? xs = xs if p x holds for each x in xs.

We love Galois connections because once two functions are identified as such, a long list of useful properties follows: f (g x) ⊑ x, z ≤ g (f z), f and g are monotonic, and are inverses of each other in the other's range... etc.

These are all very nice. But can one calculate a program from a Galois connection? Given , , and f, how does one construct g?

The "Shrink" Operator

José discovered and proposed a relational operator to handle such calculations. To use the operator, we have to turn the Galois connection (1) into point-free style. We look at the left hand side of (1): f z ⊑ x, and try to write it as a relation between z and x. Let denote the relational converse of f -- roughly, think of it as the inverse function of f, that it, it maps f z to z, and let denote relational composition -- function composition extended to relations. Thus f z ⊑ x translates to

    f° ∘ (⊑)

It is a relation between z and x: putting x on the left hand side of f° ∘ (⊑), it relates, through , to f z, which is then mapped to z through .

Then we wish that f° ∘ (⊑) can be transformed into a (relational) fold or unfold, which is often the case because the defining components: , , and f, are often folds or unfolds. Consider the lower adjoint of takeWhile p in (3). Since , the relation that takes a list and returns a prefix of the list, can be defined as a fold on lists, (map p?)° ∘ (⊑), by fold fusion, is also a fold. Consider (1), since and (* y) are both folds on natural numbers, (* y)° ∘ (≤) can be both a fold and an unfold.

In our paper we showed that a Galois connection (2) can be transformed into

    g = (f° ∘ (⊑)) ↾ (≥)

where is the new operator José introduced. The relation S ↾ R, pronounced "S shrunk by R", is a sub-relation of S that yields, for each input, an optimal result under relation R. Note that the equation made the easy/hard division explicit: f° ∘ (⊑) is the easy part: we want a solution z that satisfies f z ⊑ x, while is the criteria we use, in the hard part, to choose an optimal solution.

The operator is similar to the min operator of Bird and de Moor, without having to use sets (which needs a power allegory). It satisfies a number of useful properties. In particular, we have theorems stating when (↾ R) promotes into folds and unfolds. For example,

    (fold S) ↾ R ⊇ fold (S ↾ R)

if R is transitive and S is monotonic on R.

With the theorems we can calculate g. Given g, specified as an upper adjoint in a Galois connection with lower adjoint f, we first try to turn f° ∘ (⊑) into a fold or an unfold, and then apply the theorems to promote (↾ (≥)). For more details, take a look at our paper!