Calculating Programs from Galois Connections

Galois connections are ubiquitous in mathematics and computing science. One is often amazed that, once two functions are identified as a Galois connection, a long list of nice and often useful properties follow from one concise, elegant defining equation. But how does one construct a program from a specification given as a Galois connection? This is the topic of a recent work of José Nuno Oliveira and I, and this article is an advertisement.

Galois Connections as Specifications

In program construction one often encounters program specification of the form “… the smallest such number”, “the longest prefix of the input list satisfying …”, etc. A typical example is whole number division: given a natural number x and a positive integer y, x / y is the largest natural number that, when multiplied by y, is at most x. For another example, the Haskell function takeWhile p returns the longest prefix of the input list such that all elements satisfy predicate p.

Such specifications can be seen as consisting of two parts. The easy part specifies a collection of solution candidates: numbers that are at most x after multiplication with y, or all prefixes of the input list. The hard part, on the other hand, picks one optimal solution, such as the largest, the longest, etc., among the collection.

Our goal is to calculate programs for such specifications. But how best should the specification be given in the first place? Take division for example, one might start from a specification that literally translates our description above into mathematics:

    x / y = ⋁{ z | z * y ≤ x }

As we know, however, suprema is in general not easy to handle. One could also explicitly name the remainder:

    z = x / y  ≡  (∃ r : 0 ≤ r < y : x = z * y + r)

at the cost of existentially quantifying over the remainder.

A third option looks surprising simpler: given x and y, the value x / y is such that for all z,

    z * y ≤ x  ≡  z ≤ x / y(1)

Why is this sufficient as a definition of x / y? Firstly, by substituting x / y for z, the right hand side of reduces to true, and we obtain on the left hand side (x / y) * y ≤ x. This tell that x / y is a candidate — it satisfies the easy part of the specification. Secondly, read the definition from left to right: z * y ≤ x ⇒ z ≤ x / y. It says that x / y is the largest among all the numbers satisfying the easy part.

Equations of the form are called Galois connections. Given preorders and , Functions f and g form a Galois connection if for all x and z we have

    f z ⊑ x  ≡  z ≤ g x(2)

The function f is called the lower adjoint and g the upper adjoint.

The definition of division above is a Galois connection where f = (* y) and g = (/ y). For another example, takeWhile p can be specified as an upper adjoint:

    map p? zs ⊑ xs  ≡  zs ⊑ takeWhile p xs(3)

where is the prefix ordering: ys ⊑ xs if ys is a prefix of xs, and map p? is a partial function: map p? xs = xs if p x holds for each x in xs.

We love Galois connections because once two functions are identified as such, a long list of useful properties follows: f (g x) ⊑ x, z ≤ g (f z), f and g are monotonic, and are inverses of each other in the other’s range… etc.

These are all very nice. But can one calculate a program from a Galois connection? Given , , and f, how does one construct g?

The “Shrink” Operator

José discovered and proposed a relational operator to handle such calculations. To use the operator, we have to turn the Galois connection (1) into point-free style. We look at the left hand side of (1): f z ⊑ x, and try to write it as a relation between z and x. Let denote the relational converse of f — roughly, think of it as the inverse function of f, that it, it maps f z to z, and let denote relational composition — function composition extended to relations. Thus f z ⊑ x translates to

    f° ∘ (⊑)

It is a relation between z and x: putting x on the left hand side of f° ∘ (⊑), it relates, through , to f z, which is then mapped to z through .

Then we wish that f° ∘ (⊑) can be transformed into a (relational) fold or unfold, which is often the case because the defining components: , , and f, are often folds or unfolds. Consider the lower adjoint of takeWhile p in (3). Since , the relation that takes a list and returns a prefix of the list, can be defined as a fold on lists, (map p?)° ∘ (⊑), by fold fusion, is also a fold. Consider (1), since and (* y) are both folds on natural numbers, (* y)° ∘ (≤) can be both a fold and an unfold.

In our paper we showed that a Galois connection (2) can be transformed into

    g = (f° ∘ (⊑)) ↾ (≥)

where is the new operator José introduced. The relation S ↾ R, pronounced “S shrunk by R“, is a sub-relation of S that yields, for each input, an optimal result under relation R. Note that the equation made the easy/hard division explicit: f° ∘ (⊑) is the easy part: we want a solution z that satisfies f z ⊑ x, while is the criteria we use, in the hard part, to choose an optimal solution.

The operator is similar to the min operator of Bird and de Moor, without having to use sets (which needs a power allegory). It satisfies a number of useful properties. In particular, we have theorems stating when (↾ R) promotes into folds and unfolds. For example,

    (fold S) ↾ R ⊇ fold (S ↾ R)

if R is transitive and S is monotonic on R.

With the theorems we can calculate g. Given g, specified as an upper adjoint in a Galois connection with lower adjoint f, we first try to turn f° ∘ (⊑) into a fold or an unfold, and then apply the theorems to promote (↾ (≥)). For more details, take a look at our paper!

14 thoughts on “Calculating Programs from Galois Connections”

  1. isn’t the ↾ operator just the right adjoint of relation composition?

    f z < x
    ============
    z f~
    x —–> f z ——> z
    x ——> gx ——> z
    g >=

    f~ : converse of f
    readed as x > fz and fz apply by f~ got z

    ; is the reverse relation composition , that is R ; S = S . R

    so > ; f~ == g ; >=
    (> ; f~ ) / >= == g

    / is the adjoin of right composition .

    Hope this is clear.

    Regards.

  2. I don’t see how it captures maximality. In the example
    z * y ≤ x ≡ z ≤ x / y
    If y = 3 and x = 9,
    z * 3 ≤ 9 ≡ z ≤ 9 / 3
    if you put 2 as candidate:
    2 * 3 ≤ 9 ≡ 2 ≤ 9 / 3
    6 ≤ 9 ≡ 2 ≤ 3
    True ≡ True

    1. This is perhaps the source of confusion: in the equivalence:

          z * 3 ≤ 9  ≡  z ≤ 9 / 3

      the variable z is not 9 divided by 3, the value we aim to compute — 9 / 3 is. Variable z is merely a universally quantified dummy variable.

      Pretend that we do not know what the value of 9 / 3 is. We must pick a value for it such that z * 3 ≤ 9 ≡ z ≤ 9 / 3 always evaluates to either true ≡ true or false ≡ false. Let’s try:
      z = 0: 0 * 3 ≤ 9, thus we shall have 0 ≤ 9 / 3;
      z = 1,2: (omitted);
      z = 3: 3 * 3 ≤ 9, thus we shall have 3 ≤ 9 / 3
      z = 4: 4 * 3 ≰ 9, thus we shall have 4 ≰ 9 / 3
      Thus we conclude that 3 / 9 could only be 3.

      This way it is still hard to see where maximality comes in (if 9/3 were not the largest, the cases above for z = 2 or 3, for example, would not be true. But such an explanation still feels like an accident). To see how maximality is captured, I’d still prefer the explanation in the post: look at this direction:

          z * 3 ≤ 9  ⇒  z ≤ 9 / 3

      if any z satisfies z * 3 ≤ 9, it must be the case that z ≤ 9 / 3. For any z such that z ≰ 9 / 3, it cannot be the case that z * 3 ≤ 9. So 9 / 3 itself is the largest value z could be while still making z * 3 ≤ 9 true.

      I’ve altered the text a bit to mention that z is a dummy variable. It might avoid some confusion had I explicitly quantify z.

          (∀ z :: z * 3 ≤ 9  ≡  z ≤ 9 / 3)

      Functions f and g form a Galois connection iff:

          (∀ x z :: f z ⊑ x   ≡   z ≤ g x) 

  3. I think there’s a typo in the paper: on pages 2 and 3 when giving
    the informal specification for whole division you write:

    x / y is the largest natural number that, if multiplied by x, is at most y

    I guess it should be the other way around:

    x / y is the largest natural number that, if multiplied by y, is at most x

    Also in the blog post, the first formal specification (i.e. using a supremum)
    says that z * y < x – shouldn’t it be z * y <= x ?

    Anyway, I haven’t read the whole paper yet, but it seems quite interesting so far.

  4. If we try to translate the paper to an imperative setting, would be the easy part the invariant and the hard part the guard for the cycle? I’ve just only glanced over the article, so the question could be absolutely irrelevant.

    1. Let’s see! If I were to express ((map p?)° ∘ (⊑)) ↾ (⊒) (specification of takeWhile p) in an imperative language, (map p?)° ∘ (⊑) would be a loop generating all prefixes. Let last xs denote the right most element of a non-empty list, init xs denote all other elements, the loop is something like:

        xs, ys := XS, []; 
        do xs ≠ [] → 
           if p (last xs) → xs, ys := init xs, xs □
                            xs, ys := init xs, [] 
            | ¬ (p (last xs)) → xs, ys := init xs, []
           fi
        od

      The operator denotes non-deterministic choice (thus if p (last xs) holds, ys can be either xs or []). The loop non-deterministically returns a prefix of XS. The loop invariant is indeed the “easy part”: that ys is a prefix of xs\XS (the part of XS with xs removed) in which p is always true. The loop guard, however, is that we have not finished processing XS. After the loop, stored in ys is some prefix of XS in which p is always true. We have not specified which one.

      Fusing (↾(⊒)) into the loop, if possible, eliminates the non-determinism so that the program always yields the longest prefix. For this case the fusion is easy: simply remove the [] branch:

        xs, ys := XS, []; 
        do xs ≠ [] → 
           if p (last xs) → xs, ys := init xs, xs
            | ¬ (p (last xs)) → xs, ys := init xs, []
           fi
        od

      But now the invariant is exactly "ys is the longest prefix of xs\XS in which p is always true", that is, the easy part and the hard part. The guard stays the same.

      In summary, in the specification, the easy part is a loop invariant and the hard part is a condition imposed afterwards. The program derivation fuses the hard part into the loop so it becomes part of the invariant too.

      1. I hope Joao will soon put a publicly available version of his thesis online. I recommend it to anyone interested in Galois connections and (imperative) program construction.

  5. Excellent advert — will definitely read the paper (with Hinze’s Adjoint Folds at the back of my mind, natch!). One thing I’d look into is how do the galois-kan adjunctions generalize category-parametrically?

    1. I am not that familiar with categorical adjunctions. José appears to know about them better? Hinze’s paper is on my list of papers to read. Should have read it earlier…

      1. Categorial adjunctions generalize Galois connections. A very nice account of this generalization can be found in the following technical report by M. Fokkinga and L. Meertens: “Adjunctions”, TR 94-31, Memoranda Informatica, University of Twente, 1994.

  6. These “specifications as Galois connections” look extremely unintuitive to me. Especially the specification given for takeWhile seems downright contrived, and far less readable than the straightforward Haskell implementation of takeWhile. Hence, as far as I’m concerned these examples are not much of an advertisement for your approach–quite the contrary.

    1. Thanks for the comment. That may be true for many people and deserves some clarification, so here is another way to tell the story. Traditional arguments preferring Galois connections do not emphasise on whether such definitions are more intuitive (which is rather subjective — one gets intuition after getting used to this kind of thinking), but that Galois connections appears often (after you get used to the thinking, you notice more Galois connections hidden everywhere), and once you identify a Galois connection, many properties follow. For example, if takeWhile is defined as (3), we do not need a separate proof that takeWhile is monotonic with respect to the prefix ordering.

      However, one thing bothered me: if takeWhile is defined by (3), how do I know it is the same thing as the one in the Haskell prelude? Or, how do I know that the straightforward implementation of takeWhile does satisfy (3)? It is a relief for me to know that we can derive the implementation from (3), and the same method in fact applies to many such functions.

      There was also the typical dilemma of choosing examples. By choosing easy examples (division and takeWhile) people wonder whether it is worth spends that much effort just to derive the simple, well-known two-line program. Non-trivial programs, on the other hand, are much harder to follow.

Leave a Comment

Your email address will not be published. Required fields are marked *