Skip to content

Module 4: Dictionaries

ADT Dictionary

Dictionary

An ADT consisting of a collection of items, each of which contains:

  • a key
  • some data (the "value")

and is called a key-value pair (KVP). Keys can be compared and are (typically) unique.

Operations:

  • search(k) (also called lookup(k))
  • insert(k, v)
  • delete(k) (also called remove(k))
  • Optional: successor, join, is-empty, size, etc.

Elementary Realizations

Common assumptions:

  • Dictionary has \(n\) KVPs
  • Each KVP uses constant space (if not, the “value” could be a pointer)
  • Keys can be compared in constant time

Unordered array or linked list

  • search: \(\Theta(n)\)
  • insert: \(\Theta(1)\) (except array occasionally needs to resize)
  • delete: \(\Theta(n)\) (need to search)

Ordered array

  • search: \(\Theta(\log n)\) (via binary search)
  • insert: \(\Theta(n)\) → find order to insert in
  • delete: \(\Theta(n)\) → need to shuffle after deletion

The binary-search only applied to a sorted array

**binary-search(A, n, k)**
Input: Sorted array A of size n, k: key

ℓ <- 0, r <- n - 1
while (ℓ <= r)
    m <- ⌊(ℓ + r) / 2⌋
    if (A[m] equals k) then return "found at A[m]"
    else if (A[m] < k) then ℓ <- m + 1
    else r <- m - 1
return "not found, but would be between A[ℓ-1] and A[ℓ]"

Binary Search Trees

BST as realization of ADT Dictionary

Structure

  • Binary tree: all nodes have two (possibly empty) subtrees
  • Every node stores a KVP
  • Empty subtrees usually not shown

Ordering

  • Every key \(k\) in \(T\).left is less than the root key.
  • Every key \(k\) in \(T\).right is greater than the root key.

BST::search(k) Start at root, compare \(k\) to current node’s key. Stop if found or subtree is empty, else recurse at subtree.

BST::insert(k, v) Search for \(k\), then insert \((k, v)\) as new node

BST

  1. No structure property
  2. Unique keys
  3. Left child < right child

Heap

  1. Structure property
  2. Can have duplicates
  3. No orders of left child vs. right child

Deletion in a BST

  1. First search for the node \(x\) that contains the key.
  2. If \(x\) is a leaf (both subtrees are empty), delete it.
  3. If \(x\) has one non-empty subtree, move child up
  4. Else, swap key at \(x\) with key at successor node and then delete that node.
    • (Successor: next-smallest among all keys in the dictionary.)

Height of a BST

BST::search, BST::insert, BST::delete all have cost \(\Theta(h)\), where \(h\) = height of the tree = max. path length from root to leaf

If \(n\) items are inserted one-at-a-time, how big is \(h\)?

  • Worst-case: \(n − 1 = \Theta(n)\) (when the tree forms a linked list)
  • Best-case: \(\Theta(\log n)\). (when the tree is balanced)
    • Any binary tree with \(n\) nodes has height \(h \geq \log(n+1)-1\) (Layer \(i\) has at most \(2^i\) nodes. So \(n\leq \sum^h_{i=0}2^i = 2^{h+1}-1)\).

AVL Trees

An AVL Tree is a BST with an additional height-balance property at every node:

The heights of the left and right subtree differ by at most \(1\).

Rephrase:

If node \(v\) has left subtree \(L\) and right subtree \(R\), then

\[ \text{balance}(v) := \text{height}(R) - \text{height}(L) \]

must be in \(\{-1, 0, 1\}\)

  • balance(\(v\)) = -1 means \(v\) is left-heavy
  • balance(\(v\)) = +1 means \(v\) is right-heavy

Need to store at each node \(v\) the height of the subtree rooted at it

  • Alternative: store balance (instead of height) at each node.

AVL Tree Example

  • Indicate the MAX height of the left & right subtree

image.png

ALV is \(-1 \leq (|\text{Right}| - |\text{Left}|) \leq 1\)

image.png

Height of an AVL tree

Theorem

An AVL tree on \(n\) nodes has \(\Theta(\log n)\) height.

  • search, BST::insert, BST::delete all cost \(\Theta(\log n)\) in the worst case!

Insertion in AVL Trees

AVL insertion

To perform AVL::insert(k,v):

  • First, insert (k,v) with the usual BST insertion.
  • We assume that this returns the new leaf \(z\) where the key was stored.
  • Then, move up the tree from \(z\).
  • Update height (easy to do in constant time):
**setHeightFromSubtrees(u)**

u.height ← 1 + max{u.left.height, u.right.height}
  • If the height difference becomes \(\pm 2\) at node \(z\), then \(z\) is unbalanced. Must re-structure the tree to rebalance.

Restructuring a BST: Rotations

Right Rotation

This is a right rotation on node \(z\):

image.png

**rotate-right(z)**

c ← z.left, z.left ←p c.right, c.right ←p z
setHeightFromSubtrees(z), setHeightFromSubtrees(c)
return c // returns new root of subtree

(Notation ←p means ‘also change parent-reference of right-hand-side’)

  • Actually, the p should be above the left arrow!

Note

Concept Explanation from ChatGPT:

Pseudocode Breakdown

  1. c ← z.left
    • This line assigns node c as the left child of node z. In the context of your tree, before the rotation, c is positioned directly to the left of z.
  2. z.left ← c.right
    • Here, the right subtree of c (labelled as C in your diagram) is moved to become the left child of z. This effectively shifts subtree C under node z. It is necessary because when c moves up to take the place of z, Ccannot remain under c without disturbing the binary search tree property.
  3. c.right ← z
    • This line makes z the right child of c, completing the primary movement of the rotation. c moves up to where z was, and z moves down to the right side of c. This alters the levels of c and z, with c becoming the parent node in place of z.

Left Rotation

Symmetrically, this is a left rotation on node \(z\):

image.png

Again, only two links need to be changed and two heights updated. Useful to fix right-right imbalance.

**rotate-left(z)**

c ← z.right, z.right ←p c.left, c.left ←p z
setHeightFromSubtrees(z), setHeightFromSubtrees(c)
return c // returns new root of subtree

Double Right Rotation

This is a double right rotation on node \(z\):

image.png

First, a left rotation at \(c\).

image.png

Second, a right rotation at \(z\).

Double Left Rotation

Symmetrically, there is a double left rotation on node \(z\):

image.png

First, a right rotation at \(c\).

Second, a left rotation at \(z\).

AVL insertion revisited

Imbalance at \(z\): do (single or double) rotation

Choose \(c\) as child where subtree has bigger height.

**AVL::insert(k , v )**

z ← BST::insert(k , v ) // leaf where k is now stored
while (z is not NULL)
    if (|z.left.height − z.right.height| > 1) then
        Let c be taller child of z
        Let g be taller child of c (so grandchild of z)
        restructure(g , c , z ) // see later
        break // can argue that we are done
    setHeightFromSubtrees(z )
    z ← z.parent

Can argue: For insertion one rotation restores all heights of subtrees.

  • No further imbalances, can stop checking.

Fixing a slightly-unbalanced AVL tree

image.png

Rule:

The middle key of \(g, c, z\) becomes the new root.

Deletion in AVL Trees

AVL Deletion

Remove the key \(k\) with BST::delete.

Find node where structural change happened. (This is not necessarily near the node that had \(k\).)

Go back up to root, update heights, and rotate if needed.

**AVL::delete(k )**

z ← BST::delete(k)
// Assume z is the parent of the BST node that was removed
while (z is not NULL)
    if (|z.left.height − z.right.height| > 1) then
        Let c be taller child of z
        Let g be taller child of c (break ties to avoid double rotation)
        z ← restructure(g,c,z)
  // Always continue up the path
    setHeightFromSubtrees(z )
    z ← z.parent

Important:

Ties must be broken to avoid double rotation (same height for both subtrees).

Note

Concept Explanation from ChatGPT:

When you have a choice of nodes to rotate (usually when the child nodes of the critical node have equal heights or balance factors), you should choose the one that allows for a single rotation, thereby "breaking ties.”

AVL Tree Summary

search: Just like in BSTs, costs \(\Theta(height)\)

insert: BST::insert, then check & update along path to new leaf

  • total cost \(\Theta(height)\)
  • restructure will be called at most once.

delete: BST::delete, then check & update along path to deleted node

  • total cost \(\Theta(height)\)
  • restructure may be called \(\Theta(height)\) times.

Worst-case cost for all operations is \(\Theta(height) = \Theta(\log n)\).

In practice, the constant is quite large.