In the previous post we reviewed Hans Zantema’s algorithm for solving longest segment problems with suffix and overlap-closed predicates. For predicates that are not overlap-closed, Zantema derived a so-called “windowing” technique, which will be the topic of this post.

A brief review: the longest segment problem takes the form:

`max# ∘ p ◁ ∘ segs`

where `segs :: [a] → [[a]]`

, defined by `segs = concat ∘ map inits ∘ tails`

returns all consecutive segments of the input list; `p ◁`

is an abbreviation for `filter p`

, and `max# :: [[a]] → [a]`

returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate `p`

.

A predicate `p`

is *suffix-closed* if `p (xs ⧺ ys) ⇒ p ys`

. For suffix-closed `p`

, Zantema proposed a technique that, from a high-level point of view, looks just like the right solution to such problems. We scan through the input list using a `foldr`

from the right to the left, during which we try to maintain the longest segment satisfying `p`

so far. Also, we keep a prefix of the list that is as long as the currently longest segment, which we call the *window*.

If, when we move one element to the right, the window (now one element longer than the currently longest segment) happens to satisfy `p`

, it becomes the new optimal solution. Otherwise we drop the right-most element of the window so that it slides leftwards, retaining the length. Notice that it implies that we’d better represent the window using a queue, so we can efficiently add elements from the left and drop from the right.

Derivation of the algorithm is a typical case of *tupling*.

### Tupling

Given a function `h`

, we attempt to compute it efficiently by turning it into a `foldr`

. It would be possible if the value of the inductive case `h (x : xs)`

were determined solely by `x`

and `h xs`

, that is:

`h (x : xs) = f x (h xs)`

for some `f`

. With some investigation, however, it would turn out that `h (x : xs)`

also depends on some `g`

:

`h (x : xs) = f x (g (x : xs)) (h xs)`

Therefore, we instead try to construct their *split* `⟨ h , g ⟩`

as a fold, where the split is defined by:

`⟨ h , g ⟩ xs = (h xs, g xs)`

and `h = fst . ⟨ h , g ⟩`

.

If `⟨ h , g ⟩`

is indeed a fold, it should scan through the list and construct a pair of a `h`

-value and a `g`

-value. To make it feasible, it is then hoped that `g (x : xs)`

can be determined by `g xs`

and `h xs`

. Otherwise, we may have to repeat the process again, making the fold return a triple.

### Segment/Prefix Decomposition

Let us look into the longest segment problem. For suffix-closed `p`

it is reasonable to assume that `p []`

is true — otherwise `p`

would be false everywhere. Therefore, for the base case we have `max# ∘ p ◁ ∘ segs ▪ [] = []`

. We denote function application by `▪`

to avoid too many parentheses.

Now the inductive case. It is not hard to derive an alternative definition of `segs`

:

```
segs [] = [[]]
segs (x : xs) = inits (x:xs) ⧺ segs xs
```

therefore, we derive:

```
max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)
```

where `xs ↑# ys`

returns the longer one between `xs`

and `ys`

.

It suggests that we maintain, by a `foldr`

, a pair containing the longest segment and the longest *prefix* satisfying `p`

(that is, `max# ∘ p ◁ ∘ inits`

). It is then hoped that `max# ∘ p ◁ ∘ inits ▪ (x : xs)`

can be computed using `max# ∘ p ◁ ∘ inits ▪ xs`

. And luckily, it is indeed the case, implied by the following proposition proved in an earlier post:

Proposition 1: If`p`

is suffix-closed, we have:`p ◁ ∘ inits ▪ (x : xs) = finits (max# ∘ p ◁ ∘ inits ▪ xs)`

where

`finits ys = p ◁ ∘ inits ▪ (x : ys)`

.

Proposition 1 says that the list (or set) of all the prefixes of `x : xs`

that satisfies `p`

can be computed by the longest prefix of `xs`

(call it `ys`

) satisfying `p`

, provided that `p`

is suffix-closed. A naive way to do so is simply by computing all the prefixes of `x : ys`

and do the filtering again, as is done in `finits`

.

This was the route taken in the previous post. It would turn out, however, to come up with an efficient implementation of `f`

we need some more properties from `p`

, such as that it is also overlap-closed.

### The “Window”

Proposition 1 can be strengthened: to compute all the prefixes of `x : xs`

that satisfies `p`

using `finits`

we do not strictly have to start with `ys`

. Any prefix of `xs`

longer than `ys`

will do.

Proposition 2: If`p`

is suffix-closed, we have:`p ◁ ∘ inits ▪ (x : xs) = finits (take i xs)`

where

`finits ys = p ◁ ∘ inits ▪ (x : ys)`

, and`i ≥ length ∘ max# ∘ p ◁ ∘ inits ▪ xs`

.

In particular, we may choose `i`

to be the length of the longest segment:

Lemma 1:`length ∘ max# ∘ p ◁ ∘ segs ▪ xs ≥ length ∘ max# ∘ p ◁ ∘ inits ▪ xs`

Appealing to intuition, Lemma 1 is true because `segs xs`

is a superset of `inits xs`

.

Remark: Zantema proved Proposition 1 by contradiction. The purpose of an earlier post was to give a constructive proof of Proposition 1, which was considerably harder than I expected. I’d be interested to see a constructive proof of Proposition 2.

Now we resume the reasoning:

```
max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)
= { Proposition 2 and Lemma 1 }
let s = max# ∘ p ◁ ∘ segs ▪ xs
in (max# ∘ finits ▪ (x : take (length s) xs)) ↑# s
```

Define `window xs = take (length ∘ max# ∘ p ◁ ∘ segs ▪ xs) xs`

, the reasoning above suggest that we may try the following tupling:

```
max# ∘ p ◁ ∘ segs
= fst ∘ ⟨ max# ∘ p ◁ ∘ segs , window ⟩
```

### Maintaining the Longest Segment and the Window

The task now is to express `⟨ max# ∘ p ◁ ∘ segs , window ⟩`

as a `foldr`

. We can do so only if both `max# ∘ p ◁ ∘ segs ▪ (x : xs)`

and `window (x : xs)`

can be determined by `max# ∘ p ◁ ∘ segs ▪ xs`

and `window xs`

. Let us see whether it is the case.

#### Maintaining the Longest Segment

Regarding `max# ∘ p ◁ ∘ segs ▪ (x : xs)`

, we have derived:

```
max# ∘ p ◁ ∘ segs ▪ (x : xs)
= { as shown above, let s = max# ∘ p ◁ ∘ segs ▪ xs }
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s
```

Let `s = max# ∘ p ◁ ∘ segs ▪ xs`

. We consider two cases:

- Case
`p (x : take (length s) xs)`

. We reason:`(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s = { see below } (x : take (length s) xs) ↑# s = { since the LHS is one element longer than the RHS } x : take (length s) xs = { definition of window } x : window xs`

The first step is correct because, for all

`zs`

,`p zs`

implies that`max# ∘ p ◁ ∘ inits ▪ zs = zs`

. - Case
`¬ p (x : take (length s) xs)`

. In this case`(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s`

must be`s`

, since`¬ p zs`

implies that`length∘ max# ∘ p ◁ ∘ inits ▪ zs < length zs`

.

#### Maintaining the Window

Now consider the window. Also, we do a case analysis:

- Case
`p (x : take (length s) xs)`

. We reason:`window (x : xs) = take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs) = { by the reasoning above } take (length (x : take (length s) xs)) (x : xs) = { take and length } x : take (length (take (length s)) xs) xs = { take and length } x : take (length s) xs = x : window xs`

- Case
`¬ p (x : take (length s) xs)`

. We reason:`window (x : xs) = take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs) = { by the reasoning above } take (length s) (x : xs) = { take and length } x : take (length (s-1)) xs = x: init (window xs)`

#### The Algorithm

In summary, the reasoning above shows that

`⟨ max# ∘ p ◁ ∘ segs , window ⟩ = foldr step ([], [])`

where `step`

is given by

```
step x (s, w) | p (x : w) = (x : w, x : w)
| otherwise = (s, x : init w)
```

As is typical of many program derivations, after much work we deliver an algorithm that appears to be rather simple. The key invariants that made the algorithm correct, such as that `s`

is the optimal segment and that `w`

is as long as `s`

, are all left implicit. It would be hard to prove the correctness of the algorithm without these clues.

We are not quite done yet. The window `w`

had better be implemented using a queue, so that `init w`

can be performed efficiently. The algorithm then runs in time linear to the length of the input list, provided that `p (x : w)`

can be performed in constant time -- which is usually not the case for interesting predicates. We may then again tuple the fold with some information that helps to compute `p`

efficiently. But I shall stop here.

### Reference

- H. Zantema. Longest Segment Problems Sci. Comput. Program. 18(1): 39-66 (1992).