Module 8: Range-Searching in Dictionaries for Points¶
Range Searches¶
- So far:
search($k$)looks for one specific item. -
New operation
range-search: look for all items that fall within a given range.- Input: A range, i.e., an interval \(Q = (x, x')\). It may be open or closed at the ends.
- Goal: Report all KVPs in the dictionary whose key \(k\) satisfies \(k \in Q\).

-
As usual, \(n\) denotes the number of input items.
- Let \(s\) be the output size, i.e., the number of items in the range.
- We need \(\Omega(s)\) time simply to report the items.
- Note that sometimes \(s = 0\) and sometimes \(s = n\); we therefore keep it as a separate parameter when analyzing the runtime.
Typical runtime: \(O(\log n + s)\).
Range Searches in Existing Dictionary Realizations¶
- Unsorted list/array/hash table: Range search requires \(\Omega(n)\) time:
- We have to check each item explicitly to see if it is in the range.
-
Sorted array: Range search in \(A\) can be done in \(O(\log n + s)\) time:

- Using binary search, find \(i\) such that \(x\) is at (or would be at) \(A[i]\).
- Using binary search, find \(i'\) such that \(x'\) is at (or would be at) \(A[i']\).
- Report all items \(A[i+1 \dots i' - 1]\).
- Report \(A[i]\) and \(A[i']\) if they are in range.
- BST: Range searches can similarly be done in \(O(\text{height} + s)\) time.
- We will see this in detail later.
Multi-Dimensional Data¶
Range searches are of special interest for multi-dimensional data. Example: flights that leave between 9am and noon, and cost \(400–\)600.

- Each item has \(d\) aspects (coordinates): \((x_0, x_1, \dots, x_{d-1})\), so it corresponds to a point in \(d\)-dimensional space.
- We concentrate on \(d = 2\), i.e., points in the Euclidean plane.
- (Orthogonal) \(d\)-dimensional range search: Given a query rectangle \(Q = [x_1, x_1'] \times \cdots \times [x_d, x_d']\), find all points that lie within \(Q\).
Multi-dimensional Range Search¶
The time for range searches depends on how the points are stored.
- Could store a 1-dimensional dictionary (where the key is some combination of the aspects).
- Problem: Range search on one aspect is not straightforward.
- Could use one dictionary for each aspect.
- Problem: Inefficient, wastes space.
- Better idea: Design new data structures specifically for points.
- Quadtrees
- kd-trees
- range-trees
- Assumption: Points are in general position:
- No two points on a horizontal line.
- No two points on a vertical line.
This simplifies presentation; data structures can be generalized.
Quadtrees¶
- We have \(n\) points in the plane, represented as \(P = \{(x_0, y_0), (x_1, y_1), \dots, (x_{n-1}, y_{n-1})\}\).
- Define a bounding box \(R = [0, 2^k] \times [0, 2^k]\) to contain all points.
- Assume all coordinates are non-negative.
- Choose the smallest \(k\) such that the box covers all points.
Quadtree Structure and Construction
- Root node represents region \(R\).
- If \(R\) has 0 or 1 point, the root is a leaf storing that point.
- Otherwise, split \(R\) into four quadrants: \(R_{NE}, R_{NW}, R_{SW}, R_{SE}\).
- Partition points into sets based on quadrants.
- Convention: Points on split lines go to the right/top side.
- Recursively build subtrees for each quadrant, making them children of the root.
Quadtree Dictionary Operations¶
search: Similar to binary search trees and tries.insert:- Search for the point.
- Split the leaf if there are two points in the same region.
delete:- Search for the point and remove it.
- If the parent has only one point left, delete the parent (and recursively remove ancestors with only one point left).
Quadtree Range Search¶
**QTree::range-search(r ← root, Q)**
r: The root of a quadtree, Q: Query-rectangle
R ← region associated with node r
if (R ⊆ Q) then // inside node, stop searching
report all points below r and return
else if (R ∩ Q is empty) then return // outside node, stop searching
// boundary node, recurse
if (r is a leaf) then
p ← point stored at r
if p is not NULL and in Q then report it and return
else return
for each child v of r do QTree::range-search(v, Q)
Note: We assume each node of the quadtree stores the associated square. Alternatively, these could be re-computed during the search (space-time tradeoff).
Quadtree Analysis¶
-
Height of a Quadtree:
- Height can be very large for uneven distributions of points.
- Even with as few as 3 points, height could be arbitrarily large.

-
Bound on Height:
- There exists a weaker bound based on the spread factor of points \(P\):
\[ \text{spread factor} = \frac{\text{sidelength of } R}{\text{minimum distance between points in } P} \]- Height \(h\) of the quadtree is in \(\Theta(\log(\text{spread factor}))\).
- Complexity:
- Initial tree building: \(\Theta(nh)\) in the worst case.
- Range search: \(\Theta(nh)\) in the worst case, even if no points match.
- Quadtrees can be generalized to higher dimensions by splitting into octants (resulting in octrees) but are rarely used beyond 3 dimensions.
Quadtree Summary¶
- Easy to compute and manage.
- No complex arithmetic; only divisions by 2 (bit-shift) if bounding box \(R\) width/height is a power of 2.
- Space can be wasteful but effective if points are well-distributed.
- Variations:
- Stop splitting earlier and allow up to \(K\) points per leaf (for a fixed bound \(K\)).
- Use quadtree for storing pixelated images.

kd-Trees¶
- Given \(n\) points \(P = \{(x_0, y_0), (x_1, y_1), \dots, (x_{n-1}, y_{n-1})\}\).
- Quadtree: Splits into quadrants regardless of point positions.
- KD-Tree:
- Splits at the upper median of coordinates (roughly half of the points in each subtree).
- Each node has a splitting line in one dimension (in 2D, either vertical or horizontal).
- Convention: Points on split lines belong to the right/top side.
- Alternates between vertical and horizontal splits until each point is in its own region.
- Alternative Splitting: Use the dimension with better aspect ratios for resulting regions.

Constructing KD-Trees¶
- Start with an initial split by the x-coordinate on points \(P\):
- If \(|P| \leq 1\), create a leaf node and return.
- Otherwise:
- Set \(X := \text{randomized-quick-select}(P, \lfloor n/2 \rfloor)\) to select a median point by x-coordinate.
- Partition \(P\) by x-coordinate into:
- \(P_{x < X}\): Points with x-coordinates less than \(X\).
- \(P_{x \geq X}\): Points with x-coordinates greater than or equal to \(X\).
- Distribute \(\lfloor n/2 \rfloor\) points on one side and \(\lceil n/2 \rceil\) points on the other.
- Note: Points are assumed to be in general position.
- Recursively create subtrees:
- Left subtree for \(P_{x < X}\) (split by \(y\)-coordinate).
- Right subtree for \(P_{x \geq X}\) (split by \(y\)-coordinate).
- Symmetry: Start building with an initial \(y\)-split if desired.
KD-Tree Analysis¶
Run-Time¶
- Partitioning: \(P\) is divided in \(\Theta(n)\) expected time using
randomized-quick-select. - Recurrence:
Solves to \(\Theta(n \log n)\) expected time.
Can be reduced to \(\Theta(n \log n)\) worst-case time with pre-sorting.
Height¶
- Relation: \(h(1) = 0\) and \(h(n) \leq h(\lceil n/2 \rceil) + 1\)
- Result: Resolves to \(O(\log n)\) height, specifically \(\lceil \log n \rceil\).
Space¶
- Interior Nodes: All interior nodes have two children.
- Total Nodes: \(n - 1\) interior nodes for \(n\) leaves.
- Space Complexity: \(\Theta(n)\).
KD-Tree Dictionary Operations¶
search: Similar to binary search tree, using specified coordinate.insert: Locate position, add as new leaf.delete: Locate and remove leaf.
Problem¶
- Issue: After insertions or deletions, splits may not align with the median, and height may exceed \(\lceil \log_2 n \rceil\).
- Solution: Maintain \(O(\log n)\) height by occasionally rebuilding subtrees (without details here), though this slows down range searches.
Note: KD-trees are not efficient for frequent insertions or deletions.
KD-Tree Range Search¶
- Range search in a KD-tree is similar to that in a quadtree.
- Differences:
- Only two children per node.
- Leaves always store points.
**kdTree::range-search(r ← root, Q)**
r: The root of a kd-tree, Q: Query-rectangle
R ← region associated with node r
if (R ⊆ Q) then report all points below r; return
if (R ∩ Q is empty) then return
if (r is a leaf) then
p ← point stored at r
if p is in Q return p
else return
for each child v of r do kdTree::range-search(v, Q)
- Node Storage: Each node keeps its associated region.
- Space Optimization: Pass the region as a parameter to compute each child's region based on the splitting line.
Range Search Complexity¶
- We spend \(O(1)\) time at each visited node, except in line 2
if (R ⊆ Q) then report all points below r; return. - All calls to line 2 together take \(O(s)\) time (recall: \(s\) is the output-size).
- Observe: The number of visited nodes is \(O(\beta(n))\), where \(\beta(n)\) is the number of "boundary" nodes:
kdTree::range-searchwas called.- Neither \(R \subseteq Q\) nor \(R \cap Q = \emptyset\).
-
\(\beta(n)\) satisfies the following recurrence relation:
\[ \beta(n) \leq 2\beta(n/4) + O(1) \] -
This implies \(\beta(n) \in O(\sqrt{n})\).
- Therefore, the complexity of range search in kd-trees is \(O(s + \sqrt{n})\).
Higher Dimensions¶
- kd-trees for \(d\)-dimensional space:
- At the root, the point set is partitioned based on the first coordinate.
- At the subtrees of the root, the partition is based on the second coordinate.
- At depth \(d - 1\), the partition is based on the last coordinate.
- At depth \(d\), we start all over again, partitioning on the first coordinate.
- Storage: \(O(n)\)
- Height: \(O(\log n)\)
- Construction time: \(O(n \log n)\)
- Range search time: \(O(s + n^{1 - 1/d})\)
- This assumes that \(d\) is a constant.
Range Trees¶
Towards Range Trees¶
- Current limitations:
- Quadtrees and kd-trees are intuitive and simple.
- Both may be slow for range searches.
- Quadtrees can be space-inefficient.
- New Idea: Range trees
- Tree of trees (a multi-level data structure)
- Traditional nodes store key-value pairs and references to children and possibly the parent.
- Range trees allow more information to be stored in each node.
- Each node contains another binary search tree.
- Tree of trees (a multi-level data structure)
- Benefits:
- Though space-wasteful, range trees enable much faster range searches.
2-dimensional Range Trees¶
- Primary structure:
- A balanced binary search tree \(T\) that stores \(P\) and uses \(x\)-coordinates as keys.
- Associate structure:
- Each node \(z\) in \(T\) has an associated structure \(T_{ass}(z)\).
- \(P(z)\): All points in the subtree of \(z\) in \(T\), including the point at \(z\).
- \(T_{ass}(z)\): Stores \(P(z)\) in a balanced binary search tree using \(y\)-coordinates as keys.
- Note: The point of \(z\) is not necessarily the root of \(T_{ass}(z)\).

Range Tree Space Analysis¶
- Primary tree \(T\) uses \(O(n)\) space.
- Question: How many nodes do all associate trees together have?
- Point of \(a\) is only in associate tree \(T_{ass}(a)\).
- Point of \(b\) is in associate trees \(T_{ass}(a), T_{ass}(b)\).
- Point of \(c\) is in associate trees \(T_{ass}(a), T_{ass}(b), T_{ass}(c)\).
- Key Insight:
- Point of \(z\) is in associate tree \(T_{ass}(u)\) if and only if \(u\) is an ancestor of \(z\) in \(T\).
- Therefore, each point belongs to \(O(\log n)\) associate trees.
- As a result, all associate trees together use \(O(n \log n)\) space.
- Conclusion: A range-tree with \(n\) points uses \(O(n \log n)\) space, which is tight for some primary trees.

Range Trees Operations¶
search: Search by \(x\)-coordinate in \(T\).insert:- Insert point by \(x\)-coordinate into \(T\).
- Walk back up to the root, inserting by \(y\)-coordinate in all associate trees \(T_{ass}(z)\) of nodes \(z\) on the path to the root.
delete: Analogous to insertion.- Problem: Ensuring balanced binary search trees.
- Using AVL-trees makes insert/delete slow due to rotations, which change \(P(v)\) and require re-building \(T_{ass}(v)\).
- Solution: Rebuild highly unbalanced subtrees.
- Runtime for insert/delete: \(O(\log^2 n)\) amortized.
range-search:- Search by \(x\)-range in \(T\).
- For found points, search by \(y\)-range in associated trees.
- Note: Must understand 1-dimensional range search in binary search trees.
BST Range Search recursive¶
**BST::range-search-recursive(r ← root, x₁, x₂)**
r: root of a binary search tree, x₁, x₂: search keys
Returns keys in subtree at r that are in range [x₁, x₂]
if r = NULL then return
if x₁ ≤ r.key ≤ x₂ then
L ← BST::range-search-recursive(r.left, x₁, x₂)
R ← BST::range-search-recursive(r.right, x₁, x₂)
return L ∪ r.{key} ∪ R
if r.key < x₁ then
return BST::range-search-recursive(r.right, x₁, x₂)
if r.key > x₂ then
return BST::range-search-recursive(r.left, x₁, x₂)
Note: Keys are reported in in-order, i.e., in sorted order.
BST Range Search Re-phrased¶

- Search for left boundary \(x_1\): this defines path \(P_1\).
- Search for right boundary \(x_2\): this defines path \(P_2\).
- The tree \(T\) is divided into three sections:
- Outside the range.
- On the paths \(P_1\) and \(P_2\).
- Between the paths.
This classification of nodes will be essential for further analysis.
- Boundary nodes: Nodes in paths \(P_1\) or \(P_2\)
- For each boundary node, check if it is within the range.
- Outside nodes: Nodes to the left of \(P_1\) or right of \(P_2\)
- These nodes are outside the range and are not visited.
- Inside nodes: Nodes to the right of \(P_1\) and left of \(P_2\)
- Keep a list of the topmost inside nodes.
- All descendants of these nodes are within the range, so for a 1-dimensional range search, report them directly.
BST Range Search Analysis¶
Assuming a balanced binary search tree:
- Path Searches:
- Search for path \(P_1\): \(O(\log n)\)
- Search for path \(P_2\): \(O(\log n)\)
- Boundary Nodes:
- There are \(O(\log n)\) boundary nodes, each taking \(O(1)\) time.
- Topmost Inside Nodes:
- Each topmost inside node \(v\) requires \(O(1)\) time.
- They are children of boundary nodes, thus taking \(O(\log n)\) time.
- Descendants:
- For 1D range search, also report descendants of topmost inside nodes.
- Sum of descendants across topmost inside nodes is \(\leq s\), as their subtrees are disjoint. This takes \(O(s)\) time.
Run-time for 1D Range Search: \(O(\log n + s)\). While not faster overall, the concept of topmost inside nodes is critical for 2D range search.
Range Trees: Range Search¶
To perform a range search for \(Q = [x_1, x_2] \times [y_1, y_2]\), follow these steps:
- Range Search on \(~~x~~\)-coordinates: Perform a range search on the \(x\)-coordinates for the interval \([x_1, x_2]\) in the primary tree \(T\):
BST::range-search(T, x1, x2) - Boundary and Topmost Inside Nodes:
- Identify boundary and topmost inside nodes as usual.
- Processing Boundary Nodes:
- For each boundary node, check if the corresponding point lies within region \(Q\).
- Processing Topmost Inside Nodes:
- For each topmost inside node \(v\):
- Let \(P(z)\) represent points in the subtree of \(z\) in \(T\).
- Since all \(x\)-coordinates in \(P(z)\) are within range, proceed with \(y\)-coordinates.
- Range Search on \(~~y~~\)-coordinates:
- Use \(T_{ass}(z)\), the associated tree of \(z\), to find points where \(y\)-coordinates fall within \([y_1, y_2]\):
BST::range-search(T_ass(z), y1, y2)
- Use \(T_{ass}(z)\), the associated tree of \(z\), to find points where \(y\)-coordinates fall within \([y_1, y_2]\):
- For each topmost inside node \(v\):
Range Trees: Range Search Run-time¶
- \(O(\log n)\) time to find boundary and topmost inside nodes in the primary tree.
- There are \(O(\log n)\) such nodes.
- \(O(\log n + s_v)\) time for each topmost inside node \(v\), where \(s_v\) is the number of points in \(T_{\text{ass}}(v)\) that are reported.
- Two topmost inside nodes have no common point in their trees:
- \(\Rightarrow\) every point is reported in at most one associate structure
- \(\Rightarrow \sum_{v \text{ topmost inside}} s_v \leq s\)
Time for range search in range-tree is proportional to
Range Trees: Higher Dimensions¶
- Range trees can be generalized to \(d\)-dimensional space.
- Space: \(O(n (\log n)^{d-1})\)
- Construction time: \(O(n (\log n)^d)\)
- Range search time: \(O(s + (\log n)^d)\)
(Note: \(d\) is considered to be a constant.)
- Space/Time trade-off compared to kd-trees.

Conclusion¶
Range search data structures summary¶
- Quadtrees
- Simple (also for dynamic set of points)
- Work well only if points are evenly distributed
- Wastes space for higher dimensions
- kd-trees
- Linear space
- Range search time \(O(\sqrt{n} + s)\)
- Inserts/deletes destroy balance and range search time (no simple fix)
- Range-trees
- Range search time \(O(\log^2 n + s)\)
- Wastes some space
- Inserts/deletes destroy balance (can fix this with occasional rebuild)
Convention:
Points on split lines belong to the right/top side.
