# Finding Maximally Dense Segments

Sharon and I have finally concluded, for now, our work on the maximally dense segment problem (draft, with an errata already!), on which we have been working on and off for the past two years. Considering the algorithm itself and its derivation/proofs, I am quite happy with what we have achieved. The algorithm is rather complex, however, and it is a challenge presenting it in an accessible way. Sharon has done a great job polishing the paper, and I do hope more people would be interested in reading it and it would, eventually, inspire more work on interesting program derivations.

The basic form of the problem looks like a natural variation of the classical maximum segment sum problem: given a list of numbers, find a consecutive segment whose average, that is, sum divided by length, is maximum. The problem would be trivial without more constraints, since one could simply return the largest element, thus we usually impose a lower bound `L` on the length of feasible segments.

It was noticed by Huang [3], that a segment having maximum average need not be longer than `2L - 1`: given a segment of `2L` elements or more, we cut it in the middle. If the two halves have different averages, we keep the larger one. Otherwise the two halves have the same average. Either way, we get a shorter, feasible segment whose average is not lower. The fact hints at a trivial `O(nL)` algorithm: for each suffix of the list, find its best prefix upto `2L - 1` elements long.

A difficult challenge, however, is to come up with an algorithm that is `O(n)`, independently of `L`. The problem can be generalised to the case where the elements do not have length 1, but each has a width, and the goal is to maximise the density — sum of the elements divided by sum of their width. It makes the problem sightly more complicated, but does not change its nature. If we go on to impose an upper bound `U` on the length as well, however, the problem becomes much more difficult. There was an published algorithm that claimed to be linear only to be found not so. We discovered that two later algorithms, which appeared to have concluded the problem, also fail for a boundary case. The bug is easy to fix for one of the algorithm, but might not be so for the other.

Our algorithm closely relates to that of Chung and Lu [1] and that of Goldwasser et al [2]. The algorithm is perhaps too complex to present in detail in a blog post (that’s why we need a paper!), but I will try to give an outline using pictures from the paper, my slides and poster.

One of the ways to visualise the problem is to see each element as a block, the number being the area of the block, and the density would be its height. The input is a list of (area, width) pairs, and the goal is to find a consecutive segment maximising the height. Shown below is the input list `[(9,6),(6,2),(14,7),(20,4),(-10,5),(20,8),(-2,2),(27,6)]`, and the dashed line is their average height:

Notice that an area can be negative. In the paper, since the alphabet `w` is used for “window” (to be explained below), we instead refer to the width as “breadth”.

### Prefixes of Suffixes, and the Window

Many optimal segment problems (finding some optimal segment of a given list) are solved by finding, for each suffix, its optimal prefix, as shown below. Each bar is a suffix of the input, and the blue part is its optimal prefix:

It is preferable that an optimal prefix of `a : x` can be computed from the optimal prefix of `x`, that is, the function computing the optimal prefix is a `foldr`. If it is true, the algorithm may keep a pair of (optimal segment, optimal prefix). Each time a new element is read, it computes the new optimal prefix using the previous optimal prefix, and update the optimal segment if the new prefix is better. If you like structured recursion (or the so-called “origami programming”), this form of computation is an instance of a zygomorphism.

For each optimal prefix to be computable from the previous optimal prefix, it may not extend further than the latter. We do not want the following to happen:

However, it appears to be possible for the maximally dense prefix! Imagining adding a very small, or even negative area. We might get a denser prefix by extending further to the right since the denominator is larger.

The first theorem we had to prove aimed to show that it does not matter — if a maximally dense prefix extends further than the previous one, it is going to be suboptimal anyway. Thus it is safe if we always start from the right end of the previous prefix. That is, we do not compute the maximally dense prefix of the entire input, but merely the maximally dense prefix of the previous prefix.

This is an instance of the sliding window scheme proposed by Zantema [4]. The blue part is like a “window” of the list, containing enough information to guarantee the correctness of the algorithm. As the algorithm progresses, the two ends of the window keeps sliding to the left, hence the name.

To formally show that the window contains enough information to compute the maximally dense segment, we have to clearly state what window is, and what invariant it satisfies. It turned out to be quite tricky to formally state the intuition that “the window does not always give you the optimal prefix, but it does when it matters,” and was the first challenge we met.

Since we aim at computing a segment at least `L` units in breadth, it might be handy to split the window into a “compulsory part” (the shortest prefix that is at least `L` units wide) and the rest, the “optional part”. The algorithm thus looks like this:

where the yellow bars are the compulsory parts and blue bars the optional parts. Each time we read an element into the compulsory part, zero or more elements (since the elements have non-uniform breadths) may be shifted from the compulsory part to the optional part. Then we compute a maximally dense prefix (the yellow and the blue parts together) that does not extend further than the previous one. The best among all these prefixes is the maximally dense segment.

We want a linear time algorithm, which means that all the computation from a pair of yellow-blue bars to the next pair has to be done in (amortised) constant time — how is that possible at all? To do so we will need to exploit some structure in the optional part, based on properties of density and segments.

### Right-Skew Segments, and the DRSP

A non-empty list of elements `x` is called right-skew if, for every non-empty `x₁` and `x₂` such that `x₁ ⧺ x₂ = x`, we have `density x₁ ≤ density x₂`. Informally, a right-skew list is drawn as the blue wavy block below:

The rising wavy slope informally hints that the right half has a higher density than the left half wherever you make it cut. Howver, the drawing is at risk from the misunderstanding that a right-skew segment is a list of elements with ascending areas or densities. Note that neither the areas nor the densities of individual elements have to be ascending. For example, the list `[(9,6),(6,2),(14,7)]`, with densities `[1.5, 3, 2]`, is right-skew.

Right-skew lists are useful because of the following property. Imagining placing a list `z` next to `x`, as depicted above. To find a maximally dense prefix of `z ⧺ x` starting with `z`, it is sufficient to consider only `z` and `z ⧺ x` — nothing in the middle, such as `z ⧺ x₁`, can be denser than the two ends!

Given a window with compulsory part `c` and optional part `x`, if we can partition `x` into `x₁ ⧺ x₂ ⧺ ... ⧺ xn`, such that `x₁`, `x₂`, … `xn` are all right-skew, then to compute the maximally dense prefix of `c ⧺ x`, we only need to consider `c`, `c ⧺ x₁`, `c ⧺ x₁ ⧺ x₂`,… and `c ⧺ x₁ ⧺ x₂ ⧺ ... ⧺ xn`.

Such a partition is always possible for any list `x` — after all, each element itself constitute a singleton right-skew list. However, there is one unique right-skew partition such that the densities of `x₁`, `x₂`, … `xn` are strictly decreasing. This is called the decreasing right-skew partition (DRSP) of `x`. We will partition the optional part of the window into its DRSP. A window now looks like the picture below:

Sharon summarised many nice properties of DRSP in the paper, for which we unfortunately do not have space here. We will only look at some properties that matters for this blog post. Firstly, consider the diagram below:

In the bottom row, the leftmost block is the density of `c`, and the second block is the density of `c ⧺ x₁`, etc. If segments `x₁`, `x₂`, … `xn` have decreasing densities, the densities of `c`, `c ⧺ x₁`, `c ⧺ x₁ ⧺ x₂`,… and `c ⧺ x₁ ⧺ x₂ ⧺ ... ⧺ xn` must be bitonic — first ascending, then descending. It helps to efficiently locate the maximally dense prefix.

Secondly, the DRSP can be built and maintained in a `foldr`. The following diagram depicts how the DRSP for the list of areas `[1,4,2,5,3]` (all with breadth `1`) can be built by adding elements from the left one by one (which eventually results in one big partition):

The rule is that blocks newly added from the left keeps merging with blocks to its right until it encounters a block shorter than itself. The top-left of the diagram indicates that the DRSP of `(3` is itself. Since `5 > 3`, adding `1` results in a partition containing two segments. When `2` is added, it is merged with `5` to form a new segment with density `3.5`. No merging is triggered with the addition of `4` since `4 > 3.5` and thus `[4,3.5,3]` form a decreasing sequence. Newly added `1` first merges `4`, forming a block having density `2.5`. Since `2.5 < 3.5`, it again merges with the block `[2,5]`. Eventually all elements are grouped into one segment with density `3`. One important thing here is that adding a new element only involves merging some initial parts of the DRSP.

### Algorithm Overview

Recall that our algorithm computes, for each suffix, a prefix (a window) that is possibly optimal and contains enough information to compute all future optimal solutions. Since a feasible prefix is wider than `L`, we split it into a (yellow) compulsory part and a (blue) optional part. To obtain a linear time algorithm, we have to compute one row from the previous row in amortised constant time (the corresponding diagram is duplicated here):

The diagram below depicts how to go from one row to the next. The blue part is partitioned into DRSP. Each time an element is added to the yellow part, some elements may be shifted to the blue part, and that may trigger some right-skew segments in the blue part to be merged (second row). Then we look for a maximally dense prefix by going from right to left, chopping away segments, until we find the peak (third row):

Note that the operation shown on the third row (chopping to find the maximum) always chop away a right-skew segment in its entirety. It is important that the merging happens at the left end of the optional part, while the chopping happens at the right end. By using a tree-like data structure, each merging can be a `O(1)` operation. With the data structure, we may argue that, since each element can be merged at most once, throughout the algorithm only `O(n)` merging could happen. Similarly, each element can be chopped away at most once, so the chopping could happen at most `O(n)` time as well. Therefore the operations in the second and third rows above are both amortised `O(1)`.

### Problem with Having an Upper Bound

The discussion so far already allows us to develop an algorithm for the maximally dense segment problem without an upper bound on the breadth of feasible segments. Having the upper bound makes the problem much harder because, different from the chopping depicted above, an upper bound may cut through a right-skew segment in the middle:

And a right-skew segment, with some elements removed, might not be right-skew anymore!

Our solution is to develop another data structure that allows efficient removal from the right end of a DRSP, while maintaining the DRSP structure. The final configuration of a window looks like the diagram below, where the new data structure is represented by the green blocks:

Unfortunately, it is inefficient to add new elements from the left into the green blocks. Therefore we have to maintain the window in a way similar to how a queue is implemented using two lists. New elements are added from the left into the blue blocks; when we need to remove element from the right of a block, it is converted to a green block in large chunks.

For more details, see the paper!

### References

1. Chung, Kai-Min and Lu, Hsueh-I. An Optimal Algorithm for the Maximum-Density Segment Problem. SIAM Journal on Computing 34(2):373-387, 2004.
2. Goldwasser, Michael H. and Kao, Ming-Yang and Lu, Hsueh-I. Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications. Journal of Computer and System Sciences, 70(2):128-144, 2005.
3. Huang, Xiaoqui. An algorithm for identifying regions of a {DNA} sequence that satisfy a content requirement. Computer Applications in the Biosciences 3(10): 219-225, 1994.
4. Zantema, Hans. Longest segment problems. Science of Computer Programming, 18(1):39-66, 1992.