Galois connections are ubiquitous in mathematics and computing science. One is often amazed that, once two functions are identified as a Galois connection, a long list of nice and often useful properties follow from one concise, elegant defining equation. But how does one construct a program from a specification given as a Galois connection? This is the topic of a recent work of José Nuno Oliveira and I, and this article is an advertisement.
Galois Connections as Specifications
In program construction one often encounters program specification of the form “… the smallest such number”, “the longest prefix of the input list satisfying …”, etc. A typical example is whole number division: given a natural number x
and a positive integer y
, x / y
is the largest natural number that, when multiplied by y
, is at most x
. For another example, the Haskell function takeWhile p
returns the longest prefix of the input list such that all elements satisfy predicate p
.
Such specifications can be seen as consisting of two parts. The easy part specifies a collection of solution candidates: numbers that are at most x
after multiplication with y
, or all prefixes of the input list. The hard part, on the other hand, picks one optimal solution, such as the largest, the longest, etc., among the collection.
Our goal is to calculate programs for such specifications. But how best should the specification be given in the first place? Take division for example, one might start from a specification that literally translates our description above into mathematics:
x / y = ⋁{ z | z * y ≤ x }
As we know, however, suprema is in general not easy to handle. One could also explicitly name the remainder:
z = x / y ≡ (∃ r : 0 ≤ r < y : x = z * y + r)
at the cost of existentially quantifying over the remainder.
A third option looks surprising simpler: given x
and y
, the value x / y
is such that for all z
,
z * y ≤ x ≡ z ≤ x / y(1)
Why is this sufficient as a definition of
x / y
? Firstly, by substituting x / y
for z
, the right hand side of ≡
reduces to true, and we obtain on the left hand side (x / y) * y ≤ x
. This tell that x / y
is a candidate — it satisfies the easy part of the specification. Secondly, read the definition from left to right: z * y ≤ x ⇒ z ≤ x / y
. It says that x / y
is the largest among all the numbers satisfying the easy part.
Equations of the form are called Galois connections. Given preorders ⊑
and ≤
, Functions f
and g
form a Galois connection if for all x
and z
we have
f z ⊑ x ≡ z ≤ g x(2)
The function
f
is called the lower adjoint and g
the upper adjoint.
The definition of division above is a Galois connection where f = (* y)
and g = (/ y)
. For another example, takeWhile p
can be specified as an upper adjoint:
map p? zs ⊑ xs ≡ zs ⊑ takeWhile p xs(3)
where
⊑
is the prefix ordering: ys ⊑ xs
if ys
is a prefix of xs
, and map p?
is a partial function: map p? xs = xs
if p x
holds for each x
in xs
.
We love Galois connections because once two functions are identified as such, a long list of useful properties follows: f (g x) ⊑ x
, z ≤ g (f z)
, f
and g
are monotonic, and are inverses of each other in the other’s range… etc.
These are all very nice. But can one calculate a program from a Galois connection? Given ⊑
, ≤
, and f
, how does one construct g
?
The “Shrink” Operator
José discovered and proposed a relational operator to handle such calculations. To use the operator, we have to turn the Galois connection (1)
into point-free style. We look at the left hand side of (1)
: f z ⊑ x
, and try to write it as a relation between z
and x
. Let f°
denote the relational converse of f
— roughly, think of it as the inverse function of f
, that it, it maps f z
to z
, and let ∘
denote relational composition — function composition extended to relations. Thus f z ⊑ x
translates to
f° ∘ (⊑)
It is a relation between
z
and x
: putting x
on the left hand side of f° ∘ (⊑)
, it relates, through ⊑
, to f z
, which is then mapped to z
through f°
.
Then we wish that f° ∘ (⊑)
can be transformed into a (relational) fold or unfold, which is often the case because the defining components: ⊑
, ≤
, and f
, are often folds or unfolds. Consider the lower adjoint of takeWhile p
in (3)
. Since ⊑
, the relation that takes a list and returns a prefix of the list, can be defined as a fold on lists, (map p?)° ∘ (⊑)
, by fold fusion, is also a fold. Consider (1)
, since ≤
and (* y)
are both folds on natural numbers, (* y)° ∘ (≤)
can be both a fold and an unfold.
In our paper we showed that a Galois connection (2)
can be transformed into
g = (f° ∘ (⊑)) ↾ (≥)
where
↾
is the new operator José introduced. The relation S ↾ R
, pronounced “S
shrunk by R
“, is a sub-relation of S
that yields, for each input, an optimal result under relation R
. Note that the equation made the easy/hard division explicit: f° ∘ (⊑)
is the easy part: we want a solution z
that satisfies f z ⊑ x
, while ≥
is the criteria we use, in the hard part, to choose an optimal solution.
The ↾
operator is similar to the min
operator of Bird and de Moor, without having to use sets (which needs a power allegory). It satisfies a number of useful properties. In particular, we have theorems stating when (↾ R)
promotes into folds and unfolds. For example,
(fold S) ↾ R ⊇ fold (S ↾ R)
if
R
is transitive and S
is monotonic on R
.
With the theorems we can calculate g
. Given g
, specified as an upper adjoint in a Galois connection with lower adjoint f
, we first try to turn f° ∘ (⊑)
into a fold or an unfold, and then apply the theorems to promote (↾ (≥))
. For more details, take a look at our paper!
isn’t the ↾ operator just the right adjoint of relation composition?
f z < x
============
z f~
x —–> f z ——> z
x ——> gx ——> z
g >=
f~ : converse of f
readed as x > fz and fz apply by f~ got z
; is the reverse relation composition , that is R ; S = S . R
so > ; f~ == g ; >=
(> ; f~ ) / >= == g
/ is the adjoin of right composition .
Hope this is clear.
Regards.
I don’t see how it captures maximality. In the example
z * y ≤ x ≡ z ≤ x / y
If
y = 3
andx = 9
,z * 3 ≤ 9 ≡ z ≤ 9 / 3
if you put
2
as candidate:2 * 3 ≤ 9 ≡ 2 ≤ 9 / 3
6 ≤ 9 ≡ 2 ≤ 3
True ≡ True
This is perhaps the source of confusion: in the equivalence:
the variable
z
is not9
divided by3
, the value we aim to compute —9 / 3
is. Variablez
is merely a universally quantified dummy variable.Pretend that we do not know what the value of
9 / 3
is. We must pick a value for it such thatz * 3 ≤ 9 ≡ z ≤ 9 / 3
always evaluates to eithertrue ≡ true
orfalse ≡ false
. Let’s try:z = 0:
0 * 3 ≤ 9
, thus we shall have0 ≤ 9 / 3
;z = 1,2: (omitted);
z = 3:
3 * 3 ≤ 9
, thus we shall have3 ≤ 9 / 3
z = 4:
4 * 3 ≰ 9
, thus we shall have4 ≰ 9 / 3
Thus we conclude that
3 / 9
could only be3
.This way it is still hard to see where maximality comes in (if
9/3
were not the largest, the cases above for z = 2 or 3, for example, would not be true. But such an explanation still feels like an accident). To see how maximality is captured, I’d still prefer the explanation in the post: look at this direction:if any
z
satisfiesz * 3 ≤ 9
, it must be the case thatz ≤ 9 / 3
. For anyz
such thatz ≰ 9 / 3
, it cannot be the case thatz * 3 ≤ 9
. So9 / 3
itself is the largest valuez
could be while still makingz * 3 ≤ 9
true.I’ve altered the text a bit to mention that
z
is a dummy variable. It might avoid some confusion had I explicitly quantifyz
.
Functions
f
andg
form a Galois connection iff:I think there’s a typo in the paper: on pages 2 and 3 when giving
the informal specification for whole division you write:
I guess it should be the other way around:
Also in the blog post, the first formal specification (i.e. using a supremum)
says that
z * y < x
– shouldn’t it bez * y <= x
?Anyway, I haven’t read the whole paper yet, but it seems quite interesting so far.
Thanks for checking out the paper and spotting the errors! I’ve put corrected versions online. Hope you like the rest of the paper too.
If we try to translate the paper to an imperative setting, would be the easy part the invariant and the hard part the guard for the cycle? I’ve just only glanced over the article, so the question could be absolutely irrelevant.
Let’s see! If I were to express
((map p?)° ∘ (⊑)) ↾ (⊒)
(specification oftakeWhile p
) in an imperative language,(map p?)° ∘ (⊑)
would be a loop generating all prefixes. Letlast xs
denote the right most element of a non-empty list,init xs
denote all other elements, the loop is something like:The
□
operator denotes non-deterministic choice (thus ifp (last xs)
holds,ys
can be eitherxs
or[]
). The loop non-deterministically returns a prefix ofXS
. The loop invariant is indeed the “easy part”: thatys
is a prefix ofxs\XS
(the part ofXS
withxs
removed) in whichp
is always true. The loop guard, however, is that we have not finished processingXS
. After the loop, stored inys
is some prefix ofXS
in whichp
is always true. We have not specified which one.Fusing
(↾(⊒))
into the loop, if possible, eliminates the non-determinism so that the program always yields the longest prefix. For this case the fusion is easy: simply remove the[] branch:
But now the invariant is exactly "
ys
is the longest prefix ofxs\XS
in whichp
is always true", that is, the easy part and the hard part. The guard stays the same.In summary, in the specification, the easy part is a loop invariant and the hard part is a condition imposed afterwards. The program derivation fuses the hard part into the loop so it becomes part of the invariant too.
A while ago I challenged Joao Ferreira (http://www.cs.nott.ac.uk/~jff) in the direction you mention (imperative programming from Galois connections). See chapter 4 of his PhD thesis (2010).
I hope Joao will soon put a publicly available version of his thesis online. I recommend it to anyone interested in Galois connections and (imperative) program construction.
Excellent advert — will definitely read the paper (with Hinze’s Adjoint Folds at the back of my mind, natch!). One thing I’d look into is how do the galois-kan adjunctions generalize category-parametrically?
I am not that familiar with categorical adjunctions. José appears to know about them better? Hinze’s paper is on my list of papers to read. Should have read it earlier…
Categorial adjunctions generalize Galois connections. A very nice account of this generalization can be found in the following technical report by M. Fokkinga and L. Meertens: “Adjunctions”, TR 94-31, Memoranda Informatica, University of Twente, 1994.
These “specifications as Galois connections” look extremely unintuitive to me. Especially the specification given for takeWhile seems downright contrived, and far less readable than the straightforward Haskell implementation of takeWhile. Hence, as far as I’m concerned these examples are not much of an advertisement for your approach–quite the contrary.
Thanks for the comment. That may be true for many people and deserves some clarification, so here is another way to tell the story. Traditional arguments preferring Galois connections do not emphasise on whether such definitions are more intuitive (which is rather subjective — one gets intuition after getting used to this kind of thinking), but that Galois connections appears often (after you get used to the thinking, you notice more Galois connections hidden everywhere), and once you identify a Galois connection, many properties follow. For example, if
takeWhile
is defined as (3), we do not need a separate proof thattakeWhile
is monotonic with respect to the prefix ordering.However, one thing bothered me: if
takeWhile
is defined by (3), how do I know it is the same thing as the one in the Haskell prelude? Or, how do I know that the straightforward implementation oftakeWhile
does satisfy (3)? It is a relief for me to know that we can derive the implementation from (3), and the same method in fact applies to many such functions.There was also the typical dilemma of choosing examples. By choosing easy examples (division and
takeWhile
) people wonder whether it is worth spends that much effort just to derive the simple, well-known two-line program. Non-trivial programs, on the other hand, are much harder to follow.