Module 3: Sorting, Average-case and Randomization¶
Analyzing average-case run-time¶
Recall definition of average-case run-time:
For this module:
- Assume that the set \(\mathcal{I}_n\) of size- \(n\) instances is finite (or can be mapped to a finite set in a natural way)
- Assume that all instances occur equally frequently
Then we can use the following simplified formula:
Selection and quick-select¶
quick-select and the related quick-sort rely on two subroutines:
choose-pivot(A): Return an index \(p\) in \(A\). We will use the pivot-value \(v ← A[p]\) to rearrange the array.
partition(A, p): Rearrange \(A\) and return pivot-index \(i\) so that
- the pivot-value \(v\) is in \(A[i]\),
- all items in \(A[0, \ldots,i-1]\) are \(\leq v\), and
- all items in \(A[i+1,\ldots, n-1]\) are \(\geq v\).

where \(p=\) index of pivot-value before partition (we choose it), and \(i=\) index of pivot-value after partition (no control).
Partition Algorithm
**partition(A, p)**
A: array of size n, p: integer s.t. 0 ≤ p < n
Create empty lists smaller, equal and larger.
v ← A[p]
for each element x in A do
if x < v then smaller.append(x)
else if x > v then larger.append(x)
else equal.append(x).
i ← smaller.size
j ← equal.size
Overwrite A[0...i−1] by elements in smaller
OverwriteA[i...i+j−1]by elements in equal
Overwrite A[i+j ...n−1] by elements in larger
return i
Efficient In-Place partition (Hoare)
**partition(A, p)**
A: array of size n,
p: integer s.t. 0≤p<n
swap(A[n−1], A[p])
i ←−1, j ←n−1, v ←A[n−1]
loop
do i ←i+1 while A[i]<v
do j←j−1 while j>i and A[j]>v
if i ≥ j then break
else swap(A[i], A[j])
end loop
swap(A[n−1], A[i])
return i

Running time: \(\Theta(n)\).
Observe: \(n\) key-comparisons (comparing two input-items).
quick-select Algorithm¶
Selection: Want item \(m\) such that (after rearranging \(A\)) we have

**quick-select(A, k)**
A: array of size n, k: integer s.t. 0≤k<n
1. p ← choose-pivot(A)
2. i ← partition(A, p)
3. if i = k then return A[i]
4. else if i > k then return quick-select(A[0...i−1], k)
5. else if i < k then return quick-select(A[i+1...n−1], k−(i+1))
Note
❗
Concept Explanation from ChatGPT:
This is an implementation of the quick-select algorithm, which is used to find the k-th smallest element in an unordered list. This algorithm is closely related to the quicksort sorting algorithm and uses a similar approach of partitioning the array around a pivot element. Here's a step-by-step explanation of the code:
- Choose a Pivot: The first step in the quick-select function is to choose a pivot element from the array
A. This is done by thechoose-pivot(A)function. The method of choosing a pivot can vary, but common strategies include picking the first element, the last element, the middle element, or a random element of the array. - Partition the Array: Once a pivot is chosen, the array is partitioned by the
partition(A, p)function. The array is rearranged so that all elements less than the pivot come before the pivot, and all elements greater than the pivot come after it. The partition function returns the indexiwhere the pivot element finally resides in the array. - Determine the k-th Smallest Element:
- Case 1: Pivot is the k-th Element: If the index
iof the pivot element after partitioning is exactly equal tok, then the pivot element is the k-th smallest element, and it is returned. - Case 2: Pivot Index Greater than k: If
iis greater thank, this indicates that the k-th smallest element is in the left partition of the array. The algorithm recursively callsquick-selecton the sub-array consisting of elements before the pivot (A[0...i-1]) to find the k-th smallest element. - Case 3: Pivot Index Less than k: If
iis less thank, this indicates that the k-th smallest element is in the right partition of the array. However, we need to adjustkbecause we are now looking for a different order statistic in the sub-arrayA[i+1...n-1]. Specifically, we are looking for the(k - (i+1))th smallest element in the right sub-array.
- Case 1: Pivot is the k-th Element: If the index

Analysis of quick-select¶
Let \(T(n,k)\) be the number of key-comparisons in a size-\(n\) array with parameter \(k\). This is asymptotically the same as the run-time. partition uses \(n\) key-comparisons.
Worst-case run-time:
- Sub-array always gets smaller, so \(\leq n\) recursions.
- Each takes \(\leq n\) comparisons \(\Rightarrow O(n^2)\) time.
- This is tight: If pivot-value is always the maximum and \(k = 0\)
Best-case run-time:¶
- First chosen pivot could be the \(k\)th element
- No recursive calls; \(T(n, k) = n \in \Theta(n)\)
Randomized algorithms¶
A randomized algorithm is one which relies on some random numbers in addition to the input.
- Doing randomization is often a good idea if an algorithm has bad worst-case time but seems to perform much better on most instances.
- It can also (with restrictions) be used to bound the avg-case run-time.
Goal: Shift the dependency of run-time from what we can’t control (the input) to what we can control (the random numbers).
No more bad instances, just unlucky numbers.
Expected run-time¶
The run-time of the algorithm now depends on the random numbers.
Define \(T_\mathcal{A}(I,R)\) to be the run-time of a randomized algorithm \(\mathcal{A}\) for an instance \(I\) and the sequence \(R\) of outcomes of random trials.
The expected run-time \(T^{\text{exp}}(I)\) for instance \(I\) is the expected value:
Now take the maximum over all instances of size \(n\) to define the expected run-time (or formally: worst-instance expected-luck run-time) of \(\mathcal{A}\):
In contrast to average-case analysis, the natural guess usually is correct for the expected run-time.
Are average-case run-time and expected run-time the same? No!

There is a relationship only if the randomization effectively achieves “choose the input instance randomly”.
Summary of SELECTION¶
- randomized-quick-select has expected run-time \(\Theta(n)\).
- The run-time bound is tight since partition takes \(\Omega(n)\) time.
- If we're unlucky in the random numbers then the run-time is still \(\Omega(n^2)\).
- So the expected run-time of shuffled-quick-select is \(\Theta(n)\).
- So the run-time of quick-select on randomly chosen input is \(\Theta(n)\).
- So the average-case run-time of quick-select is \(\Theta(n)\).
- randomized-quick-select is generally the fastest solution to SELECTION.
Sorting and quick-sort¶
**quick-sort(A)**
A: array of size n
if n ≤ 1 then return
p ← choose-pivot(A)
i ← partition(A, p)
quick-sort(A[0, 1, . . . , i−1])
quick-sort(A[i+1, . . . , n−1])

quick-sort Analysis¶
Set \(T(A) :=\) # of key-comparison for quick-sort in array \(A\).
Worst-case run-time:
\(\Theta(n^2)\)
- Sub-arrays get smaller \(\Rightarrow \leq n\) levels of recursions.
- On each level there are \(\leq n\) items in total \(\Rightarrow \leq n\) key-comparisons.
- So run-time in \(O(n^2)\); this is tight exactly as for
quick-select.
Best-case run-time: \(\Theta(n)\)
- If pivot-index is always in the middle, then we recurse in two sub-arrays of size \(\leq n/2\).
- \(T(n) \leq n + 2T(n/2) \in O(n \log n)\) exactly as for
merge-sort. - This can be shown to be tight.
Summary of quick-sort¶
- randomized-quick-sort has expected run-time \(\Theta(n \log n)\).
- The run-time bound is tight since the best-case run-time is \(\Omega(n \log n)\)
- If we're unlucky in the random numbers then the run-time is still \(\Omega(n^2)\)
- Can show: This has the same expected run-time as quick-select on randomly chosen input
- So the average-case run-time of quick-sort is \(\Theta(n \log n)\).
Auxiliary space?¶
- Each nested recursion-call requires \(O(1)\) space on the call stack.
- As described,
quick-sort/randomized-quick-sortuse \(\Omega(n)\) nested recursion-calls in the worst case. - So \(O(n)\) auxiliary space (can be improved to \(O(\log n)\))
**randomized-quick-sort-improved(A, n)**
Initialize a stack S of index-pairs with { (0, n−1) }
while S is not empty
(ℓ,r) ← S.pop()
while (r−ℓ+1 > 10) do
p ← ℓ + random(ℓ−r+1)
i ← partition(A, ℓ,r, p)
if (i−ℓ > r−i) do
S.push( (ℓ, i−1) )
ℓ ← i+1
else
S.push( (i+1,r) )
r ← i−1
insertion-sort(A)
Lower Bound for Comparison-Based Sorting¶

Comparison-based sorting algorithm lower bound is \(\Omega(n\log n)\), but non-comparison-based sorting can achieve \(O(n)\) (under restrictions!).
Theorem
Any comparison-based sorting algorithm requires in the worst case \(\Omega(n \log n)\) comparisons to sort \(n\) distinct items.
Note
Any comparison-based algorithms can be expressed as decision tree.
Non-Comparison-Based Sorting¶
Assume keys are numbers in base \(R\) (\(R\): radix)
- So all digits are in \(\{0, \ldots,R-1\}\)
Assume all keys have the same number \(m\) of digits.
- Can achieve after padding with leading 0s
(Single-digit) bucket-sort¶
**bucket-sort(A, n,sort-key(·))**
A: array of size n
sort-key(·) : maps items of A to {0, . . . , R−1}
Initialize an array B[0...R − 1] of empty queues (buckets)
for i ← 0 to n−1 do
Append A[i] at end of B[sort-key(A[i])]
i ← 0
for j ← 0 to R − 1 do
while B[j] is non-empty do
move front element of B[j] to A[i++]
- In our example
sort-key(A[i])returns the last digit ofA[i] - bucket-sort is stable: equal items stay in original order.
- Run-time \(\Theta(n+R)\), auxiliary space \(\Theta(n+R)\)
Most-significant-digit(MSD)-radix-sort¶
**MSD-radix-sort(A, n, d ← 1)**
A: array of size n, contains m-digit radix-R numbers
if (d ≤ m and (n > 1))
bucket-sort(A, n,‘return dth digit of A[i]’)
ℓ ← 0 // find sub-arrays and recurse
for j ← 0 to R − 1
Let r ≥ ℓ − 1 be maximal s.t. A[ℓ..r] have dth digit j
MSD-radix-sort(A[ℓ..r],r−ℓ+1, d+1)
ℓ ← r + 1
- \(\Theta(m)\) levels of recursion in worst-case
- \(\Theta(n)\) subproblems on most levels in worst-case.
- \(\Theta(R+\text{size of sub-array}))\) time for each bucket-sort call.
- Run-time \(\Theta(mnR)\) - slow. Many recursions and allocated arrays.
Least-significant-digit(LSD)-radix-sort¶
**LSD-radix-sort(A, n)**
A: array of size n, contains m-digit radix-R numbers
for d ← least significant to most significant digit do
bucket-sort(A, n, ‘return dth digit of A[i]’)
Loop-invariant: \(A\) is sorted with respect to digits \(d, \ldots,m\) of each entry.
Time cost: \(\Theta(m(n+R))\)
Auxiliary space: \(\Theta(n+R)\)