Threads¶
An Overview of Threads¶
What are threads?¶
We can think of thread is a sequence of instructions.

Defintion
Thread is a schedulable execution context.
A thread is not just code — it’s the code + the complete environment that allows it to run independently on a CPU core.
To execute multiple tasks that a process is responsible for, at the same time, on a multiple cores, such that, a same program does different things at the same time, we do this:

- Share what you can. Separate what you must.
- Each thread has its own stack and register values (e.g. program counter, instruction register, stack pointer, frame pointer).
- Threads in the same process share: the code, global variables, heap and file descriptors (i.e. open files) of the process.
Concurrency¶
Definition
Concurrency means allowing multiple processes or threads to run, or appearing to run, at the same time.
- Preemptive multitasking (a.k.a. timesharing):
- Multiple programs appear to run simultaneously by taking turns on the same hardware.
- The operating system rapidly switches between tasks to ensure all make progress.
- Parallelism:
- Programs can run at the same time because multiple processors or cores are available.
- This enables true simultaneous execution.

- This arrangement increases processor utilization, (time being used vs. idle time waiting for I/O) by overlapping one process’s computation time with another’s time waiting for I/O.
- Timesharing can reduce latency (time between launching a program and getting a response) compared to sequence execution.
Why Threads?¶
- Threads are a lighter-weight abstraction than processes.
- They use less system resources (e.g., processor time and memory) to create and manage.
- Benefits of threads:
- Allow a single process to use multiple processors or cores in parallel.
- Enable a single process to overlap I/O and computation.
- Responsiveness:
- Different threads can handle different tasks (e.g., UI vs. background data loading) to improve responsiveness.
- Priority:
- Some threads (e.g., UI) can be given higher priority for better performance.
- A single program can have:
- Different threads for different roles:
- E.g., a browser uses separate threads for running JavaScript, displaying HTML, and playing video.
- Multiple threads for the same roles:
- E.g., a web server may spawn a separate thread for each user accessing the site.
- Different threads for different roles:
Context Switching and Preemption¶
Definition
The switch from one thread running to another is called a context switch.
- It can happen when a thread voluntarily stops running, called yielding
- It can happen when a thread is waiting for a resource or I/O, called blocking, waiting or sleeping
- Preemption is when a thread involuntarily stops running
- The kernel can preempt a thread when it gains control.
- when a new thread or process is created, or an existing one exits,
- interrupts, system calls and other exceptions.
False: Context switches have lower costs in terms of processor time than do function calls
- Context switches do not have lower costs than function calls.
- because they involve saving and restoring full CPU and memory state between threads or processes, not just calling a new piece of code.
Preemption¶
- Timer Interrupts:
- The kernel can preempt a thread using a timer interrupt.
- With timesharing, each thread is given a small time slice (e.g., 30ms) to execute on the processor.
- This time slice is called a scheduling quantum (or time quantum, time slice).
- Timer interrupts can occur more frequently (e.g., every 10ms) to assign different quantum lengths for different priorities.
- Timer interrupts help the OS enforce time-sharing by triggering regular checks.
- When a thread's time quantum expires, the timer interrupt triggers a context switch.
- Other Device Interrupts:
- The kernel can also preempt a thread due to other device interrupts such as:
- Disk I/O completion
- Network packet arrival
- Keyboard input
- Or if a previously blocked process becomes runnable again.
- The kernel can also preempt a thread due to other device interrupts such as:
- Key Point:
- The kernel tries to maintain high processor utilization.
- So it performs context switches whenever a thread can no longer execute (due to blocking or quantum expiration).
Time Sharing¶

Threads run in user space, but whenever a system call or interrupt occurs, they temporarily enter the kernel space
- Process A runs in user space until its time quantum expires. (during timer interrupt, the CPU switches to kernel mode (green) to handle the interrupt.)
- A timer interrupt sends control to the kernel.
- The OS begins a context switch:
- It saves Process A’s state
- It restores Process B’s state
- During this switch, Process B’s thread is being set up in kernel space — this includes:
- Restoring registers
- Setting the program counter
- Re-mapping virtual memory (if it’s a different process)
- Only after this setup is complete, the CPU returns to user mode to run Process B’s user code.
Context Switch¶
- Context switches are processor dependent.
- Typical tasks during a context switch:
- Save the program counter and general purpose registers (always required)
- Save floating point or other special registers
- Save status registers
- Change virtual address translations
- Non-negligible costs:
- Can cause more cache misses
- Reason: the new thread likely needs different data in memory and may not be in the cache
- The CPU must go out to main memory (RAM) to fetch data.
- This takes much longer than accessing cache (hundreds of cycles slower).
- Saving/restoring floating point registers is costly
- Optimization: only save them if the process uses them
- May require flushing the address translation cache (TLB flush)
- H/W Optimization: avoid flushing kernel’s own cache entries
- Can cause more cache misses
Context switches are essential but expensive.
Thread States¶

A thread can be in one of the following states:
- new: created but not yet running.
- terminated: exited or encountered a fatal error.
- running: currently executing (or will resume after returning from a system call).
- ready: runnable; can run, but the kernel has selected another thread.
- blocked: waiting (e.g., for I/O or an event) before continuing.
Context Switch Due to a Timer Interrupt¶
- The context switch starts with a timer interrupt.
- This event interrupts a user thread while executing code.
- The kernel routine
trap_commonsaves the trapframe on the kernel stack and then callstrap_entryto determine the cause of the trap. trap_entrydetermines that aT_IRQ_TIMERgenerated from the timer caused the interrupt.- First
trap_entrycallsKTimer_Processto process any scheduled timer events.- These threads are moved to the runnable queue if their wait time is over.
- Before scheduling what to run next timers also trigger processing events in the kernel.
- After
KTimer_Processreturns,Sched_Scheduleris called to identify the next runnable thread in the ready queue. - It calls
Sched_Switchto perform the switch. Sched_Switchloads in the new address space for the new thread and then callsThread_SwitchArchwhich saves and restore the floating point registers and then callsswitchstackswitchstacksaves and restores kernel thread state in aswitchframeand switches from one kernel stack to another.- The
switchframecontains the callee-save registers of the kernel thread. switchstacksaves the thread state of currently running thread and restores the thread state for the previously running thread.- Switching processes is a switch between kernel threads.
Trap Frame and Switch Frames¶
- A trap frame contains:
- All general purpose registers
- Many status registers
- Created during an interrupt or exception (e.g., system call)
- Stores full register state because the system cannot predict when interrupts occur
- A switchframe:
- Contains only callee-saved registers
- Used only during a context switch
- Created by the
Thread_SwitchArchfunction viaswitchstack- Saves the current thread’s callee-saved registers
- Overwritten by the registers of the thread to be run next
- Required even if a trapframe exists, due to possible kernel computation in between
- Trap frame saves the user program’s state at the moment it was interrupted
- But after that, the kernel keeps running for a bit — it runs the scheduler, chooses a new thread, and switches threads
- During this time, kernel code uses its own registers — which may have changed since the trap
- So: the kernel creates a switchframe to save its own current state before switching to another thread
Note

The trapframe stores the state needed to return to user mode, while the switchframe stores the state needed to switch between kernel threads.
switchstack() is the low-level function that performs the register switch. It:
- Saves Thread A’s callee-saved registers into its switchframe
- Overwrites those registers with Thread B’s values from its switchframe
- This prepares the CPU to continue running Thread B
the save and restore are done in separate calls, on separate stacks, at different times for switchstack

We execute kernel stack 1 in order to save in switchframe, but when switching to kernel stack 2, the kernel uses its switchframe to restore its CPU state, and control flows back up (returns) through the same kernel functions that were active when Thread 2 was previously running. So the kernel 2 runs in reverse order!
Scheduling¶
- Scheduling determines which thread should run next.
- It is implemented by a scheduler, a component of the kernel.
- A common approach is preemptive round-robin scheduling:
- Threads are stored in a ready queue.
- On a context switch:
- The running thread is placed at the end of the queue.
- The first thread in the queue is dispatched to run.
- Newly created or unblocked threads are also added to the end of the ready queue.
- The actual execution order is nondeterministic.
The Thread Interface and Implementations¶
POSIX thread API¶
int pthread_create(pthread_t *thr, pthread_attr_t *attr, void *(*fn)(void *), void *arg);- Create a new thread
thrthat runs functionfnwithargas the argument. - Optional attributes can be passed using
attr, or set toNULLfor default.
- Create a new thread
void pthread_exit(void *return_value);- Destroy the current thread and return a pointer to its return value.
int pthread_join(pthread_t thread, void **return_value);- Wait for the thread to finish and retrieve its return value.
void pthread_yield();- Hint to the scheduler to run another thread if possible.
Thread Stacks and pthread_create¶
When pthread_create is executed, a new stack is created and register values are copied over into the newly created thread (with some changes, e.g. stack pointer and frame pointer).
The left stack is the same one as the bottom right stack. The upper right stack is new.

Kernel threads vs. user threads¶
- Two types of threads:
- User threads: Created and managed by the user application, including scheduling decisions (which one including which one will be available to run next).
- Kernel threads: Created and managed by the kernel, including scheduling decisions (which one will run next).
- System calls are expensive:
- A system call might require ~100 assembly instructions.
- A function call may only require a few assembly instructions.
- Alternatives:
- Thread functions like
pthread_createcan be implemented as:- A function call in user space (efficient).
- A system call into kernel space (more costly).
- Thread functions like
Approach #1: User threads¶

- Definition: Implemented in a user-space library (e.g.,
pthread_create), accessed via function calls. - Characteristics:
- Many user threads per process, but only one kernel thread.
- A queue of runnable user threads is managed in user space.
- Only one user thread at a time is mapped to the single kernel thread and can execute.
- Limitation:
- Cannot utilize multiple processors or cores, making it less efficient for parallel execution.
- Multi-threaded web server example:
- A thread calls
readto get data from a remote web browser. - The non-blocking version of
readmakes the syscall non-blocking. - If no data is available:
- The thread yields and another thread is scheduled.
- A timer or idle cycle checks for new connection data.
- A thread calls
- A thread blocking on another may lead to deadlock.
- This can block the entire process and halt progress.
- Implementation:
- Thread Creation:
pthread_createallocates a new stack each time it is called.- once for each user-level thread you want to create
- Blocking:
- Blocking = transition from running to blocked/waiting state.
- A blocking system call blocks all threads.
- Users must replace blocking system calls (e.g.,
read,write) with non-blocking versions (e.g.,fcntl+O_NONBLOCK). - If a thread would block, switch to another runnable thread.
- This improves CPU utilization by keeping the core busy.
- Preemption with Timers:
- The system uses a periodic timer signal (e.g., via
setitimer). - Timer signals trigger context switches to other threads.
- The system uses a periodic timer signal (e.g., via
- Thread Creation:
Idle is a state that a computer processor is in when it is not being used by any program.
System call happens when crossing the user–kernel boundary, which is expensive
Approach #2: Kernel threads¶

- There is a 1:1 correspondence between user threads and kernel threads.
- Each thread is identified (e.g. by ThreadID), and every thread operation (create, exit, join, yield) must go through the kernel via a system call.
- Start from the process abstraction in the kernel:
- Avoid duplicating code, heap, and global data.
- Linux’s
clonesyscall enables selective sharing (e.g. stack, memory).
Note
The process abstraction represents a running program
It provides the illusion to each program that:
- It has its own private CPU
- Its own memory
- Its own files and resources
Even though it’s sharing all of those with other processes.
- System resource usage:
- Uses less system resources than creating a new process.
- Still expensive due to system call overhead for every thread operation.
- Thread operations can be 10–30× slower in the kernel.
- Security impact:
- SPECTRE/Meltdown mitigations further slow system calls.
- These bugs allow unauthorized access to kernel address space.
- System calls are intentionally slowed to mitigate such risks.
- One-size-fits-all constraint:
- Kernel threads must support all app demands.
- User apps may suffer overhead for unused features (e.g. priorities).
- Memory overhead:
- General solution uses larger memory, e.g., 2MB stack per kernel thread.
Approach #3: User threads on kernel threads¶

- Also known as n:m threading:
- There are n user threads mapped onto m kernel threads.
- User threads are implemented on top of kernel threads:
- Allows multiple kernel-level threads per process.
- Functions like
thread_create,thread_exitremain user-space library calls.
- Limitations:
- Suffers from same problems as n:1 threading:
- Blocked threads
- Deadlocks
- Difficult to match kernel threads to available cores to maximize parallelism:
- Kernel knows how many cores are available.
- Kernel knows which kernel threads are blocked.
- Kernel hides these details from applications for transparency.
- User-level thread scheduler may assume a thread is running, while the actual kernel thread is blocked.
- Kernel lacks awareness of each thread's importance:
- Might preempt a kernel thread performing a critical task.
- Most context switches and scheduling can happen in user space — no need for costly kernel transitions.
- More efficient for thousands of threads with frequent switching.
- Suffers from same problems as n:1 threading:
I/O Concurrency: kernel threads
Highly Concurrent: n:m threads (many context switches required)