Scheduling

Thread Scheduling Overview

  • First Come First Serve (FCFS) runs jobs in the order they arrive. It’s simple and avoids starvation. Its preemptive variant, Round-Robin (RR), gives each job a fixed time slice and cycles through them, ensuring fair sharing of CPU time.
  • Shortest Job First (SJF) runs jobs with the shortest total execution time first, minimizing average turnaround time. However, long jobs may starve. Its preemptive variant, Shortest Remaining Time First (SRTF), always runs the job with the shortest time left to finish, possibly interrupting longer jobs.
  • In processor scheduling, the jobs to be scheduled are the threads.
    • The run times of threads are normally not known.
    • Sometimes, threads are not runnable, i.e. when they are blocked.
    • Threads may have different priorities.
  • The objective of the scheduler is normally to achieve a balance between
    • responsiveness: ensure of that threads get to run regularly
    • fairness: sharing of the processor and that every thread makes progress
    • efficiency: account for the fact that there is a cost to switching
  • Schedulers can use priorities to influence scheduling decisions. Priorities may be defined by the user, the scheduler, or a mix of both. There are two main approaches: (1) scheduling the highest-priority thread, which always runs the thread with the greatest priority value; and (2) weighted fair sharing, which allocates CPU time proportionally based on each thread’s priority. If a thread has priority \(p_i\), it should get a fraction \(\frac{p_i}{\sum_j p_j}\) of the CPU time.

Multi-level Feedback Queues

image.png

Multi-level Feedback Queues (MLFQ) are a commonly used CPU scheduling algorithm that prioritizes responsiveness for interactive threads (e.g., those waiting for user input). The core idea is to dynamically adjust thread priorities based on behavior. Threads are organized in multiple round-robin queues with different priority levels: higher-priority queues have shorter time quanta and are selected first. Interactive threads tend to block often (e.g., for user input) and are moved to higher-priority queues when they wake up. Conversely, threads that use up their full quantum without blocking are demoted to lower-priority queues (larger quantum). To avoid starvation (where low-priority threads never run), all threads are periodically boosted back to the highest-priority queue. This system favors interactive tasks while ensuring all threads eventually get CPU time.

Note: each level has it’s own Run and Ready queue, but they all share the Blocked queue.

Linux Completely Fair Scheduler (CFS)

  • Each thread is assigned a weight
  • Let \(a_i\) be the actual amount of time that the scheduler has allowed thread \(T_i\) to run.

    Ideally we would expect:

    \[ a_1 \cdot \frac{\sum_j w_j}{w_1} = a_2 \cdot \frac{\sum_j w_j}{w_2} \quad \text{(e.g., } 1 \cdot \frac{1+4}{1} = 4 \cdot \frac{1+4}{4} \text{)} \]

    That is, the actual amount of run time divided by each thread's fair share is the same for all threads.

    For simplicity, we often factor out \(\sum_j w_j\), yielding:

    \[ \frac{a_1}{w_1} = \frac{a_2}{w_2} \]

    where \(w\) is the weight.

  • The thread with the smallest \(\frac{a_i}{w_i}\) ratio should run next to increase its share of the processor time

  • Completely Fair Scheduler (CFS) works using the concept of virtual run time to fairly schedule threads based on their weights (or priorities):
    • Each runnable thread has a virtual run time (VRT), which reflects how much CPU time it has effectively used, adjusted for its priority. It's computed as:

      \[ \text{VRT of } T_i = a_i \cdot \frac{\sum_j w_j}{w_i} \]

      where:

      • \(a_i\) is the actual run time of thread \(T_i\),
      • \(w_i\) is its weight,
      • \(\sum_j w_j\) is the sum of all thread weights.
        • Threads with higher weights (higher priority) have their virtual run time increase more slowly, meaning they get more CPU time. Threads with lower weights (lower priority) see their VRT increase faster.
        • The scheduler always runs the thread with the lowest virtual run time, ensuring fairness over time.
        • When a thread becomes runnable (e.g. after I/O), its virtual run time is set between the min and max VRTs of other runnable threads, to integrate it fairly.
      • MLFQ vs. CFS:
        • MLFQ uses different time quanta depending on the thread’s priority.
        • CFS uses a uniform time quantum, but adjusts VRT progression rate based on priority, achieving fairness via math rather than queue switching.
      • CFS: Every runnable thread gets the same amount of actual CPU time (called the time slice or quantum) when it runs. But virtual run time (VRT) is used to decide who runs next. Threads with lower VRT are picked first — and higher-priority threads accumulate VRT more slowly, so they stay “behind” and get picked more often.

Multicore Processors

image.png

  • Scalability and Cache Affinity:

    1. Contention and Scalability: Sharing a single ready queue across many cores causes contention since it’s a critical section requiring mutual exclusion. As the number of cores increases, this contention worsens. So, having separate queues per core improves scalability.
    2. Cache Affinity: When a thread runs on a core, its data is loaded into that core’s caches (like L1/L2). If the thread is moved to another core, the data has to be reloaded, which is inefficient. Keeping threads on the same core (honoring affinity) improves performance by reusing cached data.

    Conclusion: Per-core scheduling designs improve both scalability and performance due to reduced contention and better cache usage.

  • In per-core scheduling, each core has its own queue, which can lead to load imbalance—some cores may be idle while others are overloaded. This problem doesn’t occur in shared queue designs. To address it, load balancing mechanisms are needed when the goal is speed (e.g., in servers). On the other hand, when the goal is energy efficiency (e.g., in smartphones), the system may deliberately leave some cores idle or off to save power.