There are some engineering situations that require no more than a "textbook" data structure--such as a doubly linked list, a hash table, or a binary search tree--but many others require a dash of creativity. Only in rare situations will you need to create an entirely new type of data structure, though. More often, it will suffice to augment a textbook data structure by storing additional information in it. You can then program new operations for the data structure to support the desired application. Augmenting a data structure is not always straightforward, however, since the added information must be updated and maintained by the ordinary operations on the data structure.

This chapter discusses two data structures that are constructed by augmenting red-black trees. Section 15.1 describes a data structure that supports general order-statistic operations on a dynamic set. We can then quickly find the *i*th smallest number in a set or the rank of a given element in the total ordering of the set. Section 15.2 abstracts the process of augmenting a data structure and provides a theorem that can simplify the augmentation of red-lack trees. Section 15.3 uses this theorem to help design a data structure for maintaining a dynamic set of intervals, such as time intervals. Given a query interval, we can then quickly find an interval in the set that overlaps it.

A data structure that can support fast order-statistic operations is shown in Figure 15.1. An** order-statistic tree***T* is simply a red-black tree with additional information stored in each node. Besides the usual red-black tree fields *key*[*x*],** ***color*[*x*]*, p*[*x*]*, left*[*x*]*, *and* right*[*x*] in a node *x*, we have another field *size*[*x*]. This field contains the number of (internal) nodes in the subtree rooted at* x *(including *x* itself), that is, the size of the subtree. If we define *size*[NIL] to be 0, then we have the identity

size[x] = size[left[x]] + size[right[x]] + 1 .

(To handle the boundary condition for NIL properly, an actual implementation might test explicitly for NIL each time the *size* field is accessed or, more simply, as in Section 14.4, use a sentinel *nil*[*T*] to represent NIL, where *size*[*nil*[*T*]]* = *0.)

Before we show how to maintain this size information during insertion and deletion, let us examine the implementation of two order-statistic queries that use this additional information. We begin with an operation that retrieves an element with a given rank. The procedure OS-SELECT(*x, i*) returns a pointer to the node containing the *i*th smallest key in the subtree rooted at *x*. To find the *i*th smallest key in an order-statistic tree *T*, we call OS-SELECT(*root*[*T*]*, i*).

OS-SELECT(x,i)

1rsize[left[x]] + 1

2ifi = r

3then returnx

4elseifi<r

5then returnOS-SELECT(left[x],i)

6else returnOS-SELECT(right[x],i-r)

The idea behind OS-SELECT is similar to that of the selection algorithms in Chapter 10. The value of *size*[*left*[*x*]] is the number of nodes that come before *x* in an inorder tree walk of the subtree rooted at *x.* Thus, *size*[*left*[*x*]] + 1 is the rank of* x* within the subtree rooted at *x*.

Given a pointer to a node *x* in an order-statistic tree *T*, the procedure OS-RANK returns the position of *x* in the linear order determined by an inorder tree walk of *T.*

OS-RANK(T,x)

1rsize[left[x]] + 1

2yx

3whileyroot[T]

4do ify=right[p[y]]

5thenrr+size[left[p[y]]] + 1

6yp[y]

7returnr

The procedure works as follows. The rank of *x *can be viewed as the number of nodes preceding *x* in an inorder tree walk, plus 1 for *x* itself. The following invariant is maintained: at the top of the **while** loop of lines 3-6, *r* is the rank of *key*[*x*] in the subtree rooted at node *y*. We maintain this invariant as follows. In line 1, we set *r* to be the rank of *key*[*x*]* *within the subtree rooted at* x*. Setting *y * *x *in line 2 makes the invariant true the first time the test in line 3 executes. In each iteration of the **while** loop, we consider the subtree rooted at *p*[*y*]. We have already counted the number of nodes in the subtree rooted at node *y* that precede *x* in an inorder walk, so we must add the nodes in the subtree rooted at *y*'s sibling that precede *x* in an inorder walk, plus 1 for* p*[*y*] if it, too, precedes *x*. If *y* is a left child, then neither *p*[*y*] nor any node in* p*[*y*]'s right subtree precedes* x,* so we leave *r* alone. Otherwise, *y* is a right child and all the nodes in* p*[*y*]'s left subtree precede *x,* as does *p*[*y*] itself. Thus, in line 5, we add size[*left*[*y*]] + 1 to the current value of *r* Setting* y * *p*[*y*] makes the invariant true for the next iteration. When* y = root*[*T*], the procedure returns the value of *r*, which is now the rank of* key*[*x*].

iteration[key]yr

--------------------

1 38 2

2 30 4

3 41 4

4 26 17

Given the *size* field in each node, OS-SELECT and OS-RANK can quickly compute order-statistic information. But unless these fields can be efficiently maintained by the basic modifying operations on red-black trees, our work will have been for naught. We shall now show that subtree sizes can be maintained for both insertion and deletion without affecting the asymptotic running times of either operation.

13size[y]size[x]

14size[x]size[left[x]] +size[right[x]] + 1

Figure 15.2 illustrates how the fields are updated. The change to RIGHT-ROTATE is symmetric.

Deletion from a red-black tree also consists of two phases: the first operates on the underlying search tree, and the second causes at most three rotations and otherwise performs no structural changes. (See Section 14.4.) The first phase splices out one node *y*. To update the subtree sizes, we simply traverse a path from node *y* up to the root, decrementing the *size* field of each node on the path. Since this path has length *O*(lg *n*) in an *n*-node red-black tree, the additional time spent maintaining *size* fields in the first phase is *O*(lg *n*). The *O*(1) rotations in the second phase of deletion can be handled in the same manner as for insertion. Thus, both insertion and deletion, including the maintenance of the *size* fields, take *O*(lg *n*) time for an *n*-node order-statistic tree.

Show how OS-SELECT(*T,* 10) operates on the red-black tree *T* of Figure 15.2.

Write a nonrecursive version of OS-SELECT.

Write a recursive procedure OS-KEY-RANK(*T,k*) that takes as input an order-statistic tree *T* and a key *k* and returns the rank of *k* in the dynamic set represented by *T*. Assume that the keys of *T* are distinct.

Given an element *x* in an *n*-node order-statistic tree and a natural number* i,* how can the *i*th successor of *x* in the linear order of the tree be determined in *O*(lg *n*) time?

Consider *n* chords on a circle, each defined by its endpoints. Describe an *O*(*n *lg *n*)-time algorithm for determining the number of pairs of chords that intersect inside the circle. (For example, if the *n* chords are all diameters that meet at the center, then the correct answer is .) Assume that no two chords share an endpoint.

Augmenting a data structure can be broken into four steps:

1. choosing an underlying data structure,

2. determining additional information to be maintained in the underlying data structure,

When red-black trees underlie an augmented data structure, we can prove that certain kinds of additional information can always be efficiently maintained by insertion and deletion, thereby making step 3 very easy. The proof of the following theorem is similar to the argument from Section 15.1 that the *size* field can be maintained for order-statistic trees.

Show how the dynamic-set queries MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR can each be supported in *O*(1) worst-case time on an augmented order-statistic tree. The asymptotic performance of other operations on order-statistic trees should not be affected.

Can the depths of nodes in a red-black tree be efficiently maintained as fields in the nodes of the tree? Show how, or argue why not.

We wish to augment red-black trees with an operation RB-ENUMERATE(*x, a,b*) that outputs all the keys *k* such that *a< k* *b *in a red-black tree rooted at* x*. Describe how RB-ENUMERATE can be implemented in (*m* + lg *n*) time, where *m* is the number of keys that are output and *n* is the number of internal nodes in the tree. (*Hint*: There is no need to add new fields to the red-black tree.)

In this section, we shall augment red-black trees to support operations on dynamic sets of intervals. A * closed interval* is an ordered pair of real numbers [

Intervals are convenient for representing events that each occupy a continuous period of time. We might, for example, wish to query a database of time intervals to find out what events occurred during a given interval. The data structure in this section provides an efficient means for maintaining such an interval database.

We can represent an interval [*t*_{1},* t*_{2}] as an object *i, *with fields* low*[*i*] = *t*_{1 }(*the low endpoint*)

Figure 15.3 shows the three possibilities.

INTERVAL-INSERT(*T,x*) adds the element *x*, whose *int* field is assumed to contain an interval, to the interval tree *T*.

INTERVAL-DELETE(*T,x*) removes the element *x* from the interval tree *T*.

We must verify that insertion and deletion can be performed in *O*(lg *n*) time on an interval tree of *n* nodes. We can determine *max*[*x*] given interval *int*[*x*] and the *max* values of node *x*'s children:

max[x] = max(high[int[x]], max[left[x]], max[right[x]]).

The only new operation we need is INTERVAL-SEARCH(*T, i*), which finds an interval in tree *T* that overlaps interval *i*. If there is no interval that overlaps *i* in the tree, NIL is returned.

INTERVAL-SEARCH(T,i)

1xroot[T]

2whilexNIL andidoes not overlapint[x]

3do ifleft[x]NIL andmax[left[x]]low[i]

4thenxleft[x]

5elsexright[x]

6returnx

Consider any iteration of the **while** loop during the execution of INTERVAL-SEARCH(*T, i*).

high[i']max[left[x]]

<low[i] ,

and thus, by the interval trichotomy, *i**'* and *i* do not overlap, which completes the proof of case 2.

there must be some interval *i**'* in *x*'s left subtree such that

high[i'] = max[left[x]]

low[i].

high[i] < low[i']

low[i"].

By the interval trichotomy,* i* and *i"* do not overlap.

Write pseudocode for LEFT-ROTATE that operates on nodes in an interval tree and updates the *max* fields in *O*(1) time.

Rewrite the code for INTERVAL-SEARCH so that it works properly when all intervals are assumed to be open.

Describe an efficient algorithm that, given an interval *i,* returns an interval overlapping *i* that has the minimum low endpoint, or NIL if no such interval exists.

Suggest modifications to the interval-tree procedures to support the operation INTERVAL-SEARCH-EXACTLY(*T, i*), which returns a pointer to a node *x *in interval tree *T *such that *low*[*int*]*x*]] = *low*[*i*] and *high*[*int*[*x*]] = *high*[*i*], or NIL if *T *contains no such node. All operations, including INTERVAL-SEARCH-EXACTLY, should run in *O*(1g *n*) time on an *n*-node tree.

Show how to maintain a dynamic set *Q* of numbers that supports the operation MIN-GAP, which gives the magnitude of the difference of the two closest numbers in *Q.* For example, if *Q* = {1, 5, 9, 15, 18, 22}, then MIN-GAP(*Q*) returns 18 - 15 = 3, since 15 and 18 are the two closest numbers in *Q*. Make the operations INSERT, DELETE, SEARCH, and MIN-GAP as efficient as possible, and analyze their running times.

VLSI databases commonly represent an integrated circuit as a list of rectangles. Assume that each rectangle is rectilinearly oriented (sides parallel to the *x*- and *y*-axis), so that a representation of a rectangle consists of its minimum and maximum *x*- and *y*-coordinates. Give an *O*(*n* 1g *n*)-time algorithm to decide whether or not a set of rectangles so represented contains two rectangles that overlap. Your algorithm need not report all intersecting pairs, but it must report that an overlap exists if one rectangle entirely covers another, even if the boundary lines do not intersect. (*Hint:* Move a "sweep" line across the set of rectangles.)

The * Josephus problem* is defined as follows. Suppose that