# CHAPTER 22: DATA STRUCTURES FOR DISJOINT SETS

Section 22.1 describes the operations supported by a disjoint-set data structure and presents a simple application. In Section 22.2, we look at a simple linked-list implementation for disjoint sets. A more efficient representation using rooted trees is given in Section 22.3. The running time using the tree representation is linear for all practical purposes but is theoretically superlinear. Section 22.4 defines and discusses Ackermann's function and its very slowly growing inverse, which appears in the running time of operations on the tree-based implementation, and then uses amortized analysis to prove a slightly weaker upper bound on the running time.

# 22.1 Disjoint-set operations

UNION(x, y) unites the dynamic sets that contain x and y, say Sx and Sy, into a new set that is the union of these two sets. The two sets are assumed to be disjoint prior to the operation. The representative of the resulting set is some member of Sx Sy, although many implementations of UNION choose the representative of either Sx or Sy, as the new representative. Since we require the sets in the collection to be disjoint, we "destroy" sets Sx and Sy, removing them from the collection S.

Throughout this chapter, we shall analyze the running times of disjoint-set data structures in terms of two parameters: n, the number of MAKE-SET operations, and m, the total number of MAKE-SET, UNION, and FIND-SET operations. Since the sets are disjoint, each UNION operation reduces the number of sets by one. After n - 1 UNION operations, therefore, only one set remains. The number of UNION operations is thus at most n - 1. Note also that since the MAKE-SET operations are included in the total number of operations m, we have m n.

## An application of disjoint-set data structures

1When the edges of the graph are "static"--not changing over time--the connected components can be computed faster by using depth-first search (Exercise 23.3-9). Sometimes, however, the edges are added "dynamically" and we need to maintain the connected components as each edge is added. In this case, the implementation given here can be more efficient than running a new depth-first search for each new edge.

`CONNECTED-COMPONENTS(G)`

`1  for each vertex v  V[G]`

`2       do MAKE-SET(v)`

`3  for each edge (u,v)  E[G]`

`4       do if FIND-SET(u)  FIND-SET(v)`

`5             then UNION(u,v)`

`Edge processed                    Collection of disjoint sets`

`------------------------------------------------------------------------------`

` initial sets   {a}        {b}    {c}  {d}  {e}      {f}  {g}  {h}    {i}  {j}`

`    (b,d)       {a}        {b,d}  {c}       {e}      {f}  {g}  {h}    {i}  {j}`

`    (e,g)       {a}        {b,d}  {c}       {e,g}    {f}       {h}    {i}  {j}`

`    (a,c)       {a,c}      {b,d}            {e,g}    {f}       {h}    {i}  {j}`

`    (h,i)       {a,c}      {b,d}            {e,g}    {f}       {h,i}       {j}`

`    (a,b)       {a,b,c,d}                   {e,g}    {f}       {h,i}       {j}`

`    (e,f)       {a,b,c,d}                   {e,f,g}            {h,i}       {j}`

`    (b,c)       {a,b,c,d}                   {e,f,g}            {h,i}       {j}`

`                                     (b)`

#### Figure 22.1 (a) A graph with four connected components: {a, b, c, d}, {e, f, g}, {h, i}, and {j}. (b) The collection of disjoint sets after each edge is processed.

`SAME-COMPONENT(u,v)`

`1  if FIND-SET(u) = FIND SET(v)`

`2     then return TRUE`

`3     else return FALSE`

The procedure CONNECTED-COMPONENTS initially places each vertex v in its own set. Then, for each edge (u, v), it unites the sets containing u and v. By Exercise 22.1-2, after all the edges are processed, two vertices are in the same connected component if and only if the corresponding objects are in the same set. Thus, CONNECTED-COMPONENTS computes sets in such a way that the procedure SAME-COMPONENT can determine whether two vertices are in the same connected component. Figure 22.1 (b) illustrates how the disjoint sets are computed by CONNECTED-COMPONENTS.

## Exercises

Suppose that CONNECTED-COMPONENTS is run on the undirected graph G = (V, E), where V = {a, b, c, d, e, f, g, h, i, j, k} and the edges of E are processed in the following order: (d, i), (f, k), (g, i), (b, g), (a, h), (i, j), (d, k), (b, j), (d, f), (g, j), (a, e), (i, d). List the vertices in each connected component after each iteration of lines 3-5.

Show that after all edges are processed by CONNECTED-COMPONENTS, two vertices are in the same connected component if and only if they are in the same set.

During the execution of CONNECTED-COMPONENTS on an undirected graph G = (V, E) with k connected components, how many times is FIND-SET called? How many times is UNION called? Express your answers in terms of |V|, |E|, and k.

# 22.2 Linked-list representation of disjoint sets

## A simple implementation of union

The simplest implementation of the UNION operation using the linked-list set representation takes significantly more time than MAKE-SET or FIND-SET. As Figure 22.2(b) shows, we perform UNION(x, y) by appending x's list onto the end of y's list. The representative of the new set is the element that was originally the representative of the set containing y. Unfortunately, we must update the pointer to the representative for each object originally on x's list, which takes time linear in the length of x's list.

In fact, it is not difficult to come up with a sequence of m operations that requires (m2) time. We let n = m/2 + 1 and q = m - n = m/2 - 1 and suppose that we have objects x1, x2, . . . , xn. We then execute the sequence of m = n + q operations shown in Figure 22.3. We spend (n) time performing the n MAKE-SET operations. Because the ith UNION operation updates i objects, the total number of objects updated by all the UNION operations is

## A weighted-union heuristic

Using the linked-list representation of disjoint sets and the weighted-union heuristic, a sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET operations, takes O(m + n 1g n) time.

Proof We start by computing, for each object in a set of size n, an upper bound on the number of times the object's pointer back to the representative has been updated. Consider a fixed object x. We know that each time x's representative pointer was updated, x must have started in the smaller set. The first time x's representative pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x's representative pointer was updated, the resulting set must have had at least 4 members. Continuing on, we observe that for any k n, after x's representative pointer has been updated 1g k times, the resulting set must have at least k members. Since the largest set has at most n members, each object's representative pointer has been updated at most 1g n times over all the UNION operations. The total time used in updating the n objects is thus O(n 1g n).

The time for the entire sequence of m operations follows easily. Each MAKE-SET and FIND-SET operation takes O(1) time, and there are O(m) of them. The total time for the entire sequence is thus O(m + n 1g n).

## Exercises

Show the data structure that results and the answers returned by the FIND-SET operations in the following program. Use the linked-list representation with the weighted-union heuristic.

`1  for i  1 to 16`

`2       do MAKE-SET(xi)`

`3  for i  1 to 15 by 2`

`4       do UNION(xi,xi+1)`

`5  for i  1 to 13 by 4`

`6       do UNION(xi,xi+2)`

`7  UNION(x1,x5)`

`8  UNION(x11,x13)`

`9  UNION (x1,x10)`

`10  FIND-SET(x2)`

`11  FIND-SET(x9)`

Give a tight asymptotic bound on the running time of the sequence of operations in Figure 22.3 assuming the linked-list representation and the weighted-union heuristic.

# 22.3 Disjoint-set forests

## Heuristics to improve the running time

So far, we have not improved on the linked-list implementation. A sequence of n - 1 UNION operations may create a tree that is just a linear chain of n nodes. By using two heuristics, however, we can achieve a running time that is almost linear in the total number of operations m.

## Pseudocode for disjoint-set forests

To implement a disjoint-set forest with the union-by-rank heuristic, we must keep track of ranks. With each node x, we maintain the integer value rank[x], which is an upper bound on the height of x (the number of edges in the longest path between x and a descendant leaf). When a singleton set is created by MAKE-SET, the initial rank of the single node in the corresponding tree is 0. Each FIND-SET operation leaves all ranks unchanged. When applying UNION to two trees, we make the root of higher rank the parent of the root of lower rank. In case of a tie, we arbitrarily choose one of the roots as the parent and increment its rank.

Let us put this method into pseudocode. We designate the parent of node x by p[x]. The LINK procedure, a subroutine called by UNION, takes pointers to two roots as inputs.

`MAKE-SET(x)`

`1  p[x]  x`

`2  rank[x] 0`

`UNION(x,y)`

`1  LINK(FIND-SET(x), FIND-SET(y))`

`LINK(x,y)`

`1  if rank[x] > rank[y]`

`2     then p[y]  x`

`3     else p[x]  y`

`4          if rank[x] = rank[y]`

`5             then rank[y]  rank[y] + 1`

`FIND-SET(x)`

`1  if x  p[x]`

`2     then p[x]  FIND-SET(p[x])`

`3  return p[x]`

## Effect of the heuristics on the running time

Separately, either union by rank or path compression improves the running time of the operations on disjoint-set forests, and the improvement is even greater when the two heuristics are used together. Alone, union by rank yields the same running time as we achieved with the weighted union heuristic for the list representation: the resulting implementation runs in time O(m lg n) (see Exercise 22. 4-3). This bound is tight (see Exercise 22.3-3). Although we shall not prove it here, if there are n MAKE-SET operations (and hence at most n - 1 UNION operations) and â FIND-SET operations, the path-compression heuristic alone gives a worst-case running time of (â log(1 + â/n)n) if â n and (n + â lg n) if â < n.

When we use both union by rank and path compression, the worst-case running time is O(m (m, n)), where (m, n) is the very slowly growing inverse of Ackermann's function, which we define in Section 22.4. In any conceivable application of a disjoint-set data structure, (m, n) 4; thus, we can view the running time as linear in m in all practical situations. In Section 22.4, we prove the slightly weaker bound of O(m lg* n).

## Exercises

Give a sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET operations, that takes (m lg n) time when we use union by rank only.

Show that any sequence of m MAKE-SET, FIND-SET, and UNION operations, where all the UNION operations appear before any of the FIND-SET operations, takes only O(m) time if both path compression and union by rank are used. What happens in the same situation if only the path-compression heuristic is used?

# * 22.4 Analysis of union by rank with path compression

As noted in Section 22.3, the running time of the combined union-by-rank and path-compression heuristic is O(m (m, n)) for m disjoint-set operations on n elements. In this section, we shall examine the function to see just how slowly it grows. Then, rather than presenting the very complex proof of the O(m (m, n)) running time, we shall offer a simpler proof of a slightly weaker upper bound on the running time: O(m lg* n).

## Ackermann's function and its inverse

stands for the function g(i), defined recursively by

Intuitively, the parameter i gives the "height of the stack of 2's" that make up the exponent. For example,

#### Figure 22.6 Values of A(i, j) for small values of i and j.

The 1g* function is essentially the inverse of repeated exponentiation:

`A(1,j)  =  2j                   for j  1 ,`

`A(i,1)  =  A(i - 1,2)           for i  2 ,`

`A(i,j)  =  A(i - 1,A(i,j - 1))  for i,j  2 .`

Figure 22.6 shows the value of the function for small values of i and j.

Figure 22.7 shows schematically why Ackermann's function has such explosive growth. The first row, exponential in the column number j, is already rapidly growing. The second row consists of the widely spaced subset of columns , . . . of the first row. Lines between adjacent rows indicate columns in the lower-numbered row that are in the subset included in the higher-numbered row. The third row consists of the even more widely spaced subset of columns , . . . of the second row, which is an even sparser subset of columns of the first row. In general, the spacing between columns of row i - 1 appearing in row i increases dramatically with both the column number and the row number. Observe that for all integers j 1. Thus, for i > 2, the function A(i, j) grows even more quickly than .

#### Figure 22.7 The explosive growth of Ackermann's function. Lines between rows i - 1 and i indicate entries of row i - 1 appearing in row i. Due to the explosive growth, the horizontal spacing is not to scale. The horizontal spacing between entries of row i - 1 appearing in row i greatly increases with the column number and row number. If we trace the entries in row i to their original appearance in row 1, the explosive growth is even more evident.

`(m,n) = min {i  1: A(i, m/n) > lg n} .`

2Although this function is not the inverse of Ackermann's function in the true mathematical sense, it captures the spirit of the inverse in its growth, which is as slow as Ackermann's function is fast. The reason we use the mysterious lg n threshold is revealed in the proof of the O(m (m, n)) running time, which is beyond the scope of this book.

If we fix a value of n, then as m increases, the function (m, n) is monotonically decreasing. To see this property, note that m/n is monotonically increasing as m increases; therefore, since n is fixed, the smallest value of i needed to bring A(i,m/n) above lg n is monotonically decreasing. This property corresponds to our intuition about disjoint-set forests with path compression: for a given number of distinct elements n, as the number of operations m increases, we would expect the average find-path length to decrease due to path compression. If we perform m operations in time O(m (m, n)), then the average time per operation is O((m, n)), which is monotonically decreasing as m increases.

To back up our earlier claim that (m, n) 4 for all practical purposes, we first note that the quantity m/n is at least 1, since m n. Since Ackermann's function is strictly increasing with each argument, m/n 1 implies A(i, m/n) A(i, 1) for all i 1. In particular, A(4, m/n) A(4, 1). But we also have that

which is far greater than the estimated number of atoms in the observable universe (roughly 1080). It is only for impractically large values of n that A(4, 1) 1g n, and thus (m, n) 4 for all practical purposes. Note that the O(m 1g* n) bound is only slightly weaker than the O(m (m, n)) bound; 1g*65536 = 4 and 1g* 265536 = 5, so 1g* n 5 for all practical purposes.

## Properties of ranks

For all nodes x, we have rank[x] rank[p[x]], with strict inequality if x p[x]. The value of rank[x] is initially 0 and increases through time until x p[x]; from then on, rank[x] does not change. The value of rank[p[x]] is a monotonically increasing function of time.

Proof The proof is a straightforward induction on the number of operations, using the implementations of MAKE-SET, UNION, and FIND-SET that appear in Section 22.3. We leave it as Exercise 22.4-1.

For all tree roots x, size(x) 2rank[x].

Proof The proof is by induction on the number of LINK operations. Note that FIND-SET operations change neither the rank of a tree root nor the size of its tree.

Basis: The lemma is true before the first LINK, since ranks are initially 0 and each tree contains at least one node.

Inductive step: Assume that the lemma holds before performing the operation LINK(x, y). Let rank denote the rank just before the LINK, and let rank' denote the rank just after the LINK. Define size and size' similarly.

If rank[x] rank[y], assume without loss of generality that rank[x] < rank[y]. Node y is the root of the tree formed by the LINK operation, and

`size'(y)  =  size(x) + size(y)`

`  2rank[x] + 2rank[y]`

`  2rank[y]`

`=  2rank'[y].`

No ranks or sizes change for any nodes other than y.

If rank[x] = rank[y], node y is again the root of the new tree, and

`size'(y)  =  size(x) + size(y)`

`  2rank[x] + 2rank[y]`

`=  2rank[y]+ 1`

`=  2rank'[y] .      `

For any integer r 0, there are at most n/2r nodes of rank r.

Proof Fix a particular value of r. Suppose that when we assign a rank r to a node x (in line 2 of MAKE-SET or in line 5 of LINK), we attach a label x to each node in the tree rooted at x. By Lemma 22.3, at least 2r nodes are labeled each time. Suppose that the root of the tree containing node x changes. Lemma 22.2 assures us that the rank of the new root (or, in fact, of any proper ancestor of x) is at least r + 1. Since we assign labels only when a root is assigned a rank r, no node in this new tree will ever again be labeled. Thus, each node is labeled at most once, when its root is first assigned rank r. Since there are n nodes, there are at most n labeled nodes, with at least 2r labels assigned for each node of rank r. If there were more than n/2r nodes of rank r, then more than 2r. (n/2r) = n nodes would be labeled by a node of rank r, which is a contradiction. Therefore, at most n/2r nodes are ever assigned rank r.

Every node has rank at most 1g n.

Proof If we let r > 1g n, then there are at most n/2r < 1 nodes of rank r. Since ranks are natural numbers, the corollary follows.

## Proving the time bound

Suppose we convert a sequence S' of m' MAKE-SET, UNION, and FIND-SET operations into a sequence S of m MAKE-SET, LINK, and FIND-SET operations by turning each UNION into two FIND-SET operations followed by a LINK. Then, if sequence S runs in O(m 1g* n) time, sequence S' runs in O(m' 1g* n) time.

Proof Since each UNION operation in sequence S' is converted into three operations in S, we have m' m 3m'. Since m = O(m'), an O(m 1g* n) time bound for the converted sequence S implies an O(m' 1g* n) time bound for the original sequence S'.

In the remainder of this section, we shall assume that the initial sequence of m' MAKE-SET, UNION, and FIND-SET operations has been converted to a sequence of m MAKE-SET, LINK, and FIND-SET operations. We now prove an O(m 1g* n) time bound for the converted sequence and appeal to Lemma 22.6 to prove the O(m' 1g* n) running time of the original sequence of m' operations.

A sequence of m MAKE-SET, LINK, and FIND-SET operations, n of which are MAKE-SET operations, can be performed on a disjoint-set forest with union by rank and path compression in worst-case time O(m 1g* n).

Proof We assess charges corresponding to the actual cost of each set operation and compute the total number of charges assessed once the entire sequence of set operations has been performed. This total then gives us the actual cost of all the set operations.

The charges assessed to the MAKE-SET and LINK operations are simple: one charge per operation. Since these operations each take O(1) actual time, the charges assessed equal the actual costs of the operations.

Before discussing charges assessed to the FIND-SET operations, we partition node ranks into blocks by putting rank r into block 1g* r for r = 0, 1, . . . , 1g n. (Recall that 1g n is the maximum rank.) The highest-numbered block is therefore block 1g*(1g n) = 1g* n - 1. For notational convenience, we define for integers j -1,

Then, for j = 0, 1, . . . , lg* n - 1, the jth block consists of the set of ranks

`{B(j - 1) + 1, B(j - 1) + 2, . . . , B(j)} .`

We use two types of charges for a FIND-SET operation: block charges and path charges. Suppose that the FIND-SET starts at node x0 and that the find path consists of nodes x0, x1, . . . , xl, where for i = 1, 2, . . . , l, node xi is p[xi-1] and xl (a root) is p[xl]. For j = 0, 1, . . . , lg* n - 1, we assess one block charge to the last node with rank in block j on the path. (Note that Lemma 22.2 implies that on any find path, the nodes with ranks in a given block are consecutive.) We also assess one block charge to the child of the root, that is, to xl-1. Because ranks strictly increase along any find path, an equivalent formulation assesses one block charge to each node xi such that p[xi] = xl (xi is the root or its child) or 1g* rank[xi] < 1g* rank[xi+1](the block of xi's rank differs from that of its parent). At each node on the find path for which we do not assess a block charge, we assess one path charge.

Once a node other than the root or its child is assessed block charges, it will never again be assessed path charges. To see why, observe that each time path compression occurs, the rank of a node xi for which p[xi] xl remains the same, but the new parent of xi has a rank strictly greater than that of xi's old parent. The difference between the ranks of xi and its parent is a monotonically increasing function of time. Thus, the difference between 1g* rank[p[xi]] and 1g* rank[xi] is also a monotonically increasing function of time. Once xi and its parent have ranks in different blocks, they will always have ranks in different blocks, and so xi will never again be assessed a path charge.

Since we have charged once for each node visited in each FIND-SET operation, the total number of charges assessed is the total number of nodes visited in all the FIND-SET operations; this total represents the actual cost of all the FIND-SET operations. We wish to show that this total is O(m 1g* n).

The number of block charges is easy to bound. There is at most one block charge assessed for each block number on the given find path, plus one block charge for the child of the root. Since block numbers range from 0 to 1g* n - 1, there are at most 1g* n + 1 block charges assessed for each FIND-SET operation. Thus, there are at most m(1g* n + 1) block charges assessed over all FIND-SET operations.

Bounding the path charges is a little trickier. We start by observing that if a node xi is assessed a path charge, then p[xi] xl before path compression, so that xi will be assigned a new parent during path compression. Moreover, as we have observed, xi's new parent has a higher rank than its old parent. Suppose that node xi's rank is in block j. How many times can xi be assigned a new parent, and thus assessed a path charge, before xiis assigned a parent whose rank is in a different block (after which xi will never again be assessed a path charge)? This number of times is maximized if xi has the lowest rank in its block, namely B(j - 1) + 1, and its parents' ranks successively take on the values B(j - 1) + 2, B(j - 1) + 3, . . . , B(j). Since there are B(j) - B(j - 1) - 1 such ranks, we conclude that a vertex can be assessed at most B(j) - B(j - 1) - 1 path charges while its rank is in block j.

Our next step in bounding the path charges is to bound the number of nodes that have ranks in block j for integers j 0. (Recall that by Lemma 22.2, the rank of a node is fixed once it becomes a child of another node.) Let the number of nodes whose ranks are in block j be denoted by N(j). Then, by Lemma 22.4,

For j = 0, this sum evaluates to

`N(0)  =  n/20 + n/21`

`=  3n/2`

`=  3n/2B(0).`

For j 1, we have

Thus, N(j) 3n/2B(j) for all integers j 0.

We finish bounding the path charges by summing over all blocks the product of the maximum number of nodes with ranks in the block and the maximum number of path charges per node of that block. Denoting by P(n) the overall number of path charges, we have

Thus, the total number of charges incurred by FIND-SET operations is O(m(1g* n + 1) + n 1g* n), which is O(m 1g* n) since m n. Since there are O(n) MAKE-SET and LINK operations, with one charge each, the total time is O(m 1g* n).

A sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET operations, can be performed on a disjoint-set forest with union by rank and path compression in worst-case time O(m 1g* n).

Proof Immediate from Theorem 22.7 and Lemma 22.6.

## Exercises

Prove Lemma 22.2.

For each node x, how many bits are necessary to store size(x)? How about rank[x]?

Using Lemma 22.2 and Corollary 22.5, give a simple proof that operations on a disjoint-set forest with union by rank but without path compression run in O(m 1g n) time.

Suppose we modify the rule about assessing charges so that we assess one block charge to the last node on the find path whose rank is in block j for j = 0, 1, . . . , 1g* n - 1. Otherwise, we assess one path charge to the node. Thus, if a node is a child of the root and is not the last node of a block, it is assessed a path charge, not a block charge. Show that (m) path charges could be assessed a given node while its rank is in a given block j.

# Problems

a. In the following instance of the off-line minimum problem, each INSERT is represented by a number and each EXTRACT-MIN is represented by the letter E:

`4, 8, E, 3, E, 9, 2, 6, E, E, E, 1, 7, E, 5.`

Fill in the correct values in the extracted array.

To develop an algorithm for this problem, we break the sequence S into homogeneous subsequences. That is, we represent S by

`I1,E,I2,E,I3,...,Im,E,Im+1 ,`

where each E represents a single EXTRACT-MIN call and each Ij represents a (possibly empty) sequence of INSERT calls. For each subsequence Ij, we initially place the keys inserted by these operations into a set Kj, which is empty if Ij is empty. We then do the following.

`OFF-LINE-MINIMUM(m,n)`

`1  for i  1 to n`

`2       do determine j such that i  Kj`

`3          if j  m + 1`

`4             then extracted[j]  i`

`5                 let l be the smallest value greater than j`

`for which set Kl exists `

`6                 Kl  Kj  Kl, destroying Kj`

`7  return extracted`

b. Argue that the array extracted returned by OFF-LINE-MINIMUM is correct.

c. Describe how to use a disjoint-set data structure to implement OFF-LINE-MINIMUM efficiently. Give a tight bound on the worst-case running time of your implementation.

a. Suppose that we use a tree representation similar to a disjoint-set forest: p[v] is the parent of node v, except that p[v] = v if v is a root. If we implement GRAFT(r, v) by setting p[r] v and FIND-DEPTH(v) by following the find path up to the root, returning a count of all nodes other than v encountered, show that the worst-case running time of a sequence of m MAKE-TREE, FIND-DEPTH, and GRAFT operations is (m2).

By using the union-by-rank and path-compression heuristics, we can reduce the worst-case running time. We use the disjoint-set forest S = {Si}, where each set Si (which is itself a tree) corresponds to a tree Ti in the forest F. The tree structure within a set Si, however, does not necessarily correspond to that of Ti. In fact, the implementation of Si does not record the exact parent-child relationships but nevertheless allows us to determine any node's depth in Ti.

The key idea is to maintain in each node v a "pseudodistance" d[v],which is defined so that the sum of the pseudodistances along the path from v to the root of its set Si equals the depth of v in Ti. That is, if the path from v to its root in Si is v0, v1, . . . ,vk, where v0 = v and vk is Si's root, then the depth of v in

b. Give an implementation of MAKE-TREE.

c. Show how to modify FIND-SET to implement FIND-DEPTH. Your implementation should perform path compression, and its running time should be linear in the length of the find path. Make sure that your implementation updates pseudodistances correctly.

d. Show how to modify the UNION and LINK procedures to implement GRAFT (r, v), which combines the sets containing r and v. Make sure that your implementation updates pseudodistances correctly. Note that the root of a set Si is not necessarily the root of the corresponding tree Ti.

e. Give a tight bound on the worst-case running time of a sequence of m MAKE-TREE, FIND-DEPTH, and GRAFT operations, n of which are MAKE-TREE operations.

To solve the off-line least-common-ancestors problem, the following procedure performs a tree walk of T with the initial call LCA(root[T]). Each node is assumed to be colored WHITE prior to the walk.

`LCA(u)`

`1  MAKE-SET(u)`

`2  ancestor[FIND-SET(u)]  u`

`3  for each child v of u in T`

`4       do LCA(v)`

`5          UNION(u,v)`

`6          ancestor[FIND-SET(u)]  u`

`7  color[u]  BLACK`

`8  for each node v such that {u,v}  P`

`9       do if color[v] = BLACK`

`10             then print "The least common ancestor of"`

`u "and" v "is" ancestor [FIND-SET(v)]`

a. Argue that line 10 is executed exactly once for each pair {u, v} P.

b. Argue that at the time of the call LCA (u), the number of sets in the disjoint-set data structure is equal to the depth of u in T.

c. Prove that LCA correctly prints the least common ancestor of u and v for each pair {u ,v} P.

d. Analyze the running time of LCA, assuming that we use the implementation of the disjoint-set data structure in Section 22.3.

# Chapter notes

Tarjan [187] showed that a lower bound of (m (m, n)) time is required for operations on any disjoint-set data structure satisfying certain technical conditions. This lower bound was later generalized by Fredman and Saks [74], who showed that in the worst case, (m (m, n)) (1g n)-bit words of memory must be accessed.

Go to Part VI     Back to Table of Contents