Skip to content

Threads

An Overview of Threads

What are threads?

We can think of thread is a sequence of instructions.

image.png

Defintion

Thread is a schedulable execution context.

A thread is not just code — it’s the code + the complete environment that allows it to run independently on a CPU core.

To execute multiple tasks that a process is responsible for, at the same time, on a multiple cores, such that, a same program does different things at the same time, we do this:

image.png

  • Share what you can. Separate what you must.
  • Each thread has its own stack and register values (e.g. program counter, instruction register, stack pointer, frame pointer).
  • Threads in the same process share: the code, global variables, heap and file descriptors (i.e. open files) of the process.

Concurrency

Definition

Concurrency means allowing multiple processes or threads to run, or appearing to run, at the same time.

  • Preemptive multitasking (a.k.a. timesharing):
    • Multiple programs appear to run simultaneously by taking turns on the same hardware.
    • The operating system rapidly switches between tasks to ensure all make progress.
  • Parallelism:
    • Programs can run at the same time because multiple processors or cores are available.
    • This enables true simultaneous execution.

image.png

  • This arrangement increases processor utilization, (time being used vs. idle time waiting for I/O) by overlapping one process’s computation time with another’s time waiting for I/O.
  • Timesharing can reduce latency (time between launching a program and getting a response) compared to sequence execution.

Why Threads?

  • Threads are a lighter-weight abstraction than processes.
    • They use less system resources (e.g., processor time and memory) to create and manage.
  • Benefits of threads:
    • Allow a single process to use multiple processors or cores in parallel.
    • Enable a single process to overlap I/O and computation.
  • Responsiveness:
    • Different threads can handle different tasks (e.g., UI vs. background data loading) to improve responsiveness.
  • Priority:
    • Some threads (e.g., UI) can be given higher priority for better performance.
  • A single program can have:
    • Different threads for different roles:
      • E.g., a browser uses separate threads for running JavaScript, displaying HTML, and playing video.
    • Multiple threads for the same roles:
      • E.g., a web server may spawn a separate thread for each user accessing the site.

Context Switching and Preemption

Definition

The switch from one thread running to another is called a context switch.

  • It can happen when a thread voluntarily stops running, called yielding
  • It can happen when a thread is waiting for a resource or I/O, called blocking, waiting or sleeping
  • Preemption is when a thread involuntarily stops running
  • The kernel can preempt a thread when it gains control.
    • when a new thread or process is created, or an existing one exits,
    • interrupts, system calls and other exceptions.

False: Context switches have lower costs in terms of processor time than do function calls

  • Context switches do not have lower costs than function calls.
    • because they involve saving and restoring full CPU and memory state between threads or processes, not just calling a new piece of code.

Preemption

  • Timer Interrupts:
    • The kernel can preempt a thread using a timer interrupt.
    • With timesharing, each thread is given a small time slice (e.g., 30ms) to execute on the processor.
      • This time slice is called a scheduling quantum (or time quantum, time slice).
    • Timer interrupts can occur more frequently (e.g., every 10ms) to assign different quantum lengths for different priorities.
      • Timer interrupts help the OS enforce time-sharing by triggering regular checks.
    • When a thread's time quantum expires, the timer interrupt triggers a context switch.
  • Other Device Interrupts:
    • The kernel can also preempt a thread due to other device interrupts such as:
      • Disk I/O completion
      • Network packet arrival
      • Keyboard input
    • Or if a previously blocked process becomes runnable again.
  • Key Point:
    • The kernel tries to maintain high processor utilization.
    • So it performs context switches whenever a thread can no longer execute (due to blocking or quantum expiration).

Time Sharing

image.png

Threads run in user space, but whenever a system call or interrupt occurs, they temporarily enter the kernel space

  1. Process A runs in user space until its time quantum expires. (during timer interrupt, the CPU switches to kernel mode (green) to handle the interrupt.)
  2. A timer interrupt sends control to the kernel.
  3. The OS begins a context switch:
    • It saves Process A’s state
    • It restores Process B’s state
  4. During this switch, Process B’s thread is being set up in kernel space — this includes:
    • Restoring registers
    • Setting the program counter
    • Re-mapping virtual memory (if it’s a different process)
  5. Only after this setup is complete, the CPU returns to user mode to run Process B’s user code.

Context Switch

  • Context switches are processor dependent.
  • Typical tasks during a context switch:
    • Save the program counter and general purpose registers (always required)
    • Save floating point or other special registers
    • Save status registers
    • Change virtual address translations
  • Non-negligible costs:
    • Can cause more cache misses
      • Reason: the new thread likely needs different data in memory and may not be in the cache
      • The CPU must go out to main memory (RAM) to fetch data.
      • This takes much longer than accessing cache (hundreds of cycles slower).
    • Saving/restoring floating point registers is costly
      • Optimization: only save them if the process uses them
    • May require flushing the address translation cache (TLB flush)
      • H/W Optimization: avoid flushing kernel’s own cache entries

Context switches are essential but expensive.

Thread States

image.png

A thread can be in one of the following states:

  • new: created but not yet running.
  • terminated: exited or encountered a fatal error.
  • running: currently executing (or will resume after returning from a system call).
  • ready: runnable; can run, but the kernel has selected another thread.
  • blocked: waiting (e.g., for I/O or an event) before continuing.

Context Switch Due to a Timer Interrupt

  • The context switch starts with a timer interrupt.
  • This event interrupts a user thread while executing code.
  • The kernel routine trap_common saves the trapframe on the kernel stack and then calls trap_entry to determine the cause of the trap.
  • trap_entry determines that a T_IRQ_TIMER generated from the timer caused the interrupt.
  • First trap_entry calls KTimer_Process to process any scheduled timer events.
    • These threads are moved to the runnable queue if their wait time is over.
  • Before scheduling what to run next timers also trigger processing events in the kernel.
  • After KTimer_Process returns, Sched_Scheduler is called to identify the next runnable thread in the ready queue.
  • It calls Sched_Switch to perform the switch.
  • Sched_Switch loads in the new address space for the new thread and then calls Thread_SwitchArch which saves and restore the floating point registers and then calls switchstack
  • switchstack saves and restores kernel thread state in a switchframe and switches from one kernel stack to another.
  • The switchframe contains the callee-save registers of the kernel thread.
  • switchstack saves the thread state of currently running thread and restores the thread state for the previously running thread.
  • Switching processes is a switch between kernel threads.

Trap Frame and Switch Frames

  • A trap frame contains:
    • All general purpose registers
    • Many status registers
    • Created during an interrupt or exception (e.g., system call)
    • Stores full register state because the system cannot predict when interrupts occur
  • A switchframe:
    • Contains only callee-saved registers
    • Used only during a context switch
    • Created by the Thread_SwitchArch function via switchstack
      • Saves the current thread’s callee-saved registers
      • Overwritten by the registers of the thread to be run next
    • Required even if a trapframe exists, due to possible kernel computation in between
      • Trap frame saves the user program’s state at the moment it was interrupted
      • But after that, the kernel keeps running for a bit — it runs the scheduler, chooses a new thread, and switches threads
      • During this time, kernel code uses its own registers — which may have changed since the trap
      • So: the kernel creates a switchframe to save its own current state before switching to another thread

Note

image.png

The trapframe stores the state needed to return to user mode, while the switchframe stores the state needed to switch between kernel threads.

switchstack() is the low-level function that performs the register switch. It:

  • Saves Thread A’s callee-saved registers into its switchframe
  • Overwrites those registers with Thread B’s values from its switchframe
  • This prepares the CPU to continue running Thread B

the save and restore are done in separate calls, on separate stacks, at different times for switchstack

image.png

We execute kernel stack 1 in order to save in switchframe, but when switching to kernel stack 2, the kernel uses its switchframe to restore its CPU state, and control flows back up (returns) through the same kernel functions that were active when Thread 2 was previously running. So the kernel 2 runs in reverse order!

Scheduling

  • Scheduling determines which thread should run next.
  • It is implemented by a scheduler, a component of the kernel.
  • A common approach is preemptive round-robin scheduling:
    • Threads are stored in a ready queue.
    • On a context switch:
      • The running thread is placed at the end of the queue.
      • The first thread in the queue is dispatched to run.
    • Newly created or unblocked threads are also added to the end of the ready queue.
  • The actual execution order is nondeterministic.

The Thread Interface and Implementations

POSIX thread API

  • int pthread_create(pthread_t *thr, pthread_attr_t *attr, void *(*fn)(void *), void *arg);
    • Create a new thread thr that runs function fn with arg as the argument.
    • Optional attributes can be passed using attr, or set to NULL for default.
  • void pthread_exit(void *return_value);
    • Destroy the current thread and return a pointer to its return value.
  • int pthread_join(pthread_t thread, void **return_value);
    • Wait for the thread to finish and retrieve its return value.
  • void pthread_yield();
    • Hint to the scheduler to run another thread if possible.

Thread Stacks and pthread_create

When pthread_create is executed, a new stack is created and register values are copied over into the newly created thread (with some changes, e.g. stack pointer and frame pointer).

The left stack is the same one as the bottom right stack. The upper right stack is new.

image.png

Kernel threads vs. user threads

  • Two types of threads:
    1. User threads: Created and managed by the user application, including scheduling decisions (which one including which one will be available to run next).
    2. Kernel threads: Created and managed by the kernel, including scheduling decisions (which one will run next).
  • System calls are expensive:
    • A system call might require ~100 assembly instructions.
    • A function call may only require a few assembly instructions.
  • Alternatives:
    • Thread functions like pthread_create can be implemented as:
      • A function call in user space (efficient).
      • A system call into kernel space (more costly).

Approach #1: User threads

image.png

  • Definition: Implemented in a user-space library (e.g., pthread_create), accessed via function calls.
  • Characteristics:
    • Many user threads per process, but only one kernel thread.
    • A queue of runnable user threads is managed in user space.
    • Only one user thread at a time is mapped to the single kernel thread and can execute.
  • Limitation:
    • Cannot utilize multiple processors or cores, making it less efficient for parallel execution.
    • Multi-threaded web server example:
      • A thread calls read to get data from a remote web browser.
      • The non-blocking version of read makes the syscall non-blocking.
      • If no data is available:
        • The thread yields and another thread is scheduled.
        • A timer or idle cycle checks for new connection data.
    • A thread blocking on another may lead to deadlock.
      • This can block the entire process and halt progress.
  • Implementation:
    • Thread Creation:
      • pthread_create allocates a new stack each time it is called.
        • once for each user-level thread you want to create
    • Blocking:
      • Blocking = transition from running to blocked/waiting state.
      • A blocking system call blocks all threads.
      • Users must replace blocking system calls (e.g., read, write) with non-blocking versions (e.g., fcntl + O_NONBLOCK).
      • If a thread would block, switch to another runnable thread.
      • This improves CPU utilization by keeping the core busy.
    • Preemption with Timers:
      • The system uses a periodic timer signal (e.g., via setitimer).
      • Timer signals trigger context switches to other threads.

Idle is a state that a computer processor is in when it is not being used by any program.

System call happens when crossing the user–kernel boundary, which is expensive

Approach #2: Kernel threads

image.png

  • There is a 1:1 correspondence between user threads and kernel threads.
  • Each thread is identified (e.g. by ThreadID), and every thread operation (create, exit, join, yield) must go through the kernel via a system call.
  • Start from the process abstraction in the kernel:
    • Avoid duplicating code, heap, and global data.
  • Linux’s clone syscall enables selective sharing (e.g. stack, memory).

Note

The process abstraction represents a running program

It provides the illusion to each program that:

  • It has its own private CPU
  • Its own memory
  • Its own files and resources

Even though it’s sharing all of those with other processes.

  • System resource usage:
    • Uses less system resources than creating a new process.
    • Still expensive due to system call overhead for every thread operation.
    • Thread operations can be 10–30× slower in the kernel.
  • Security impact:
    • SPECTRE/Meltdown mitigations further slow system calls.
    • These bugs allow unauthorized access to kernel address space.
    • System calls are intentionally slowed to mitigate such risks.
  • One-size-fits-all constraint:
    • Kernel threads must support all app demands.
    • User apps may suffer overhead for unused features (e.g. priorities).
  • Memory overhead:
    • General solution uses larger memory, e.g., 2MB stack per kernel thread.

Approach #3: User threads on kernel threads

image.png

  • Also known as n:m threading:
    • There are n user threads mapped onto m kernel threads.
  • User threads are implemented on top of kernel threads:
    • Allows multiple kernel-level threads per process.
    • Functions like thread_create, thread_exit remain user-space library calls.
  • Limitations:
    • Suffers from same problems as n:1 threading:
      • Blocked threads
      • Deadlocks
    • Difficult to match kernel threads to available cores to maximize parallelism:
      • Kernel knows how many cores are available.
      • Kernel knows which kernel threads are blocked.
      • Kernel hides these details from applications for transparency.
      • User-level thread scheduler may assume a thread is running, while the actual kernel thread is blocked.
    • Kernel lacks awareness of each thread's importance:
      • Might preempt a kernel thread performing a critical task.
    • Most context switches and scheduling can happen in user space — no need for costly kernel transitions.
    • More efficient for thousands of threads with frequent switching.

I/O Concurrency: kernel threads

Highly Concurrent: n:m threads (many context switches required)