Network Layer: Control Plane¶

Routing Protocols¶

Routing Protocols and Dijkstra's Link-State Algorithm¶

Routing Fundamentals¶

Goal: Determine "good" paths (routes) from a sending host to a receiving host through a network of routers.
Path: The sequence of routers a packet traverses.
Metrics for "Good": Usually defined as least "cost," "fastest," or "least congested".
Graph Abstraction: Networks are represented as graphs \(G = (N, E)\), where \(N\) is the set of routers and \(E\) is the set of links. Link cost (\(c_{a,b}\)) is defined by the network operator and can be constant (1), or inversely related to bandwidth/congestion.

Algorithm Classifications¶

Global ("Link State"): All routers have complete topology and link cost info. Examples include Dijkstra's algorithm.
Decentralized ("Distance Vector"): Routers only know costs to attached neighbors; paths are computed through an iterative exchange of info with neighbors.
Static vs. Dynamic: Static routes change slowly over time, while dynamic routes change quickly in response to periodic updates or link cost changes.

Dijkstra's Link-State Algorithm¶

Nature: Centralized and iterative. After \(k\) iterations, the least-cost paths to \(k\) destinations are known.
Key Notation:
- \(c_{x,y}\): Direct link cost from \(x\) to \(y\).
- \(D(v)\): Current estimate of the cost of the least-cost-path from source to \(v\).
- \(p(v)\): Predecessor node along the path from source to \(v\).
  - It stores which node you came from before reaching \(v\) in the shortest path.
- \(N'\): Set of nodes whose least-cost-path is definitively known.
The Algorithm Loop:
1. Initialization: Set \(N' = \{u\}\). For all neighbors \(v\) of \(u\), \(D(v) = c_{u,v}\); else \(\infty\).
2. Find: Find \(w\) not in \(N'\) with the minimum \(D(w)\).
3. Update: For all neighbors \(v\) of \(w\) not in \(N'\), update: \(D(v) = \min(D(v), D(w) + c_{w,v})\).
4. Repeat: Until all nodes are in \(N'\).

Resulting Example¶

We can construct least-cost-path tree by tracing predecessor nodes.
Ties can exist (can be broken arbitrarily)

Complexity and Discussion¶

Algorithm Complexity: With \(n\) nodes, the algorithm requires \(n(n+1)/2\) comparisons, resulting in \(O(n^2)\) complexity. More efficient implementations can reach \(O(n \log n)\).
Message Complexity: Each of the \(n\) routers must broadcast its link state info to all others. The overall message complexity is \(O(n^2)\).
Outputs: The algorithm produces a least-cost-path tree and a forwarding table for the source node.

Distance Vector (DV) Routing Algorithm¶

The distance vector stores the current best cost from this node to every destination. It contains one value per destination.

Core Mechanism¶

Bellman-Ford Equation: The algorithm is based on the Bellman-Ford (BF) equation (dynamic programming):

\[ D_x(y) = \min_v \{ c_{x,v} + D_v(y) \} \]

where \(D_x(y)\) is the least cost from \(x\) to \(y\), \(c_{x,v}\) is the direct cost to neighbor \(v\), and \(D_v(y)\) is \(v\)'s estimated cost to \(y\).

When \(x\) receives new DV estimate from any neighbor, it updates its own DV using B-F equation.
From time-to-time, each node sends its own distance vector estimate to neighbors. Under minor, natural conditions, the estimate \(D_x(y)\) converge to the actual least cost \(d_x(y)\).
Iterative & Asynchronous: Each node waits for a local link cost change or a DV update message from a neighbor, recomputes its DV estimates, and notifies neighbors only if its own DV to any destination changes.
- Trigger: DV changes
- Action: send updated DV
- Recipients: all neighbors (not only the ones affected).
Distributed & Self-stopping: each node notifies neighbors only when its DV changes
- Neighbors then notify their neighbors – only if necessary.
- No notification received, no actions taken!
Distributed Information Diffusion: Iterative communication and computation steps cause information about the network state to "diffuse" through the network over time (e.g., state at \(t=0\) may influence nodes 1 hop away at \(t=1\), and 4 hops away at \(t=4\)).

Each node maintains the cost of its direct links and a distance table, which stores the cost to each destination via each neighbor.
- For each destination \(d\) and each neighbor \(n\), the table stores \(\text{cost to reach } d \text{ via neighbor } n\)
From this it computes its distance vector.
If the distance vector changes, the node sends the updated DV to all neighbors.

The distance table stores costs via each neighbor, while the distance vector stores only the minimum cost to each destination and is shared with neighbors.

An advertisement means a routing update message that a node sends to its neighbors containing its distance vector.

Link Cost Changes and Challenges¶

When a path gets better, nodes quickly adopt it.

When a path gets worse, nodes may keep believing outdated low costs from each other.

"Good News Travels Fast": When a link cost decreases, the network quickly recalculates and converges to new least-cost paths.
"Bad News Travels Slow" (Count-to-Infinity): When a link cost increases significantly, nodes may iteratively update each other with slowly increasing costs, creating a routing loop that takes a long time to stabilize.
- What happens:
  
  When a link cost increases or a link fails, nodes may still believe outdated low-cost routes learned earlier from neighbors. Because Distance Vector routers only know neighbors’ advertised distances (not full paths), they may incorrectly assume the neighbor still has a valid route.
- Why it happens (loop of misinformation):
  
  Two nodes may use each other as the next hop to the same destination.
  
  Example:
  - Y thinks Z can reach X cheaply.
  - Z thinks Y can reach X cheaply. (That means Z’s cheapest route to X actually depends on Y, but Y thinks it has a valid route, so they repeatedly advertise to each other! Or, Z’s route to X depends on Y, and Y’s new route to X depends on Z.)
    
    When the real link to X becomes expensive, they keep updating each other with slightly larger costs (6, 7, 8, 9, …), creating a routing loop of increasing estimates.
- Count-to-infinity behavior:
  
  Each update increases the estimated distance by the link cost between the two nodes. The cost “counts upward” step-by-step as routers repeatedly recompute using each other’s advertisements.
- Why bad news travels slow:
  
  Distance Vector routing spreads information hop-by-hop and only knows neighbors’ estimates. When a route becomes worse, nodes cannot immediately detect that the neighbor’s route actually depends on them.
- When it stops:
  
  The process ends when a router finally discovers a better alternative path (e.g., a direct link with cost 50) or when the distance reaches a predefined maximum value (infinity) in protocols like RIP.

Bad news travel slow does NOT always happen. It occurs when a link cost increases or a link fails, and nodes learn routes to a destination through each other, creating a dependency loop.
- Bad news propagates quickly when a node has an independent alternative route, or protocol mechanisms prevent loops.

Poisoned Reverse¶

Poisoned reverse is a technique used in Distance Vector routing to reduce routing loops. If a node uses a particular neighbor as the next hop to reach a destination, it advertises the distance to that destination as infinity when sending its distance vector back to that neighbor. This tells the neighbor that the node should not be used as a route to that destination, preventing the neighbor from mistakenly routing packets through it. Without poisoned reverse, two nodes might believe the other still has a valid path and repeatedly advertise increasing costs to each other, causing the count-to-infinity problem. By advertising infinity, poisoned reverse immediately breaks this two-node loop because the neighbor will not consider that node as a possible route. However, poisoned reverse does not always completely eliminate routing loops, especially when more than two routers are involved in a cycle. In larger networks, incorrect routes may still propagate among multiple nodes before the system converges, though poisoned reverse still helps reduce the likelihood and duration of such loops.

Rule for poisoned reverse

For a destination \(D\):

If a router reaches \(D\) through neighbor \(N\) (i.e., next hop = \(N\)),
Then when sending its distance vector to \(N\), it advertises: \(\text{cost}(D) = \infty\)
When sending updates to other neighbors, it advertises the normal cost.

Comparison: Link State (LS) vs. Distance Vector (DV)¶

Feature	Link State (LS)	Distance Vector (DV)
Message Complexity	\(O(n^2)\) messages sent by \(n\) routers.	Exchange between neighbors; convergence time varies.
Speed of Convergence	\(O(n^2)\) algorithm; may have oscillations.	Convergence time varies; may have routing loops or count-to-infinity.
Robustness	A router can advertise an incorrect link cost (but it can only lie about its own local links); each router computes its own table.	A router can advertise an incorrect path cost; errors propagate through the network.

LS is still more robust than DV because errors don’t propagate recursively.

Intra-ISP routing: OSPF¶

Scalable Routing and Autonomous Systems (AS)¶

The Internet is too large to use a "flat" routing structure. Scaling to billions of destinations requires a hierarchical approach to manage routing table sizes and administrative autonomy.

Hierarchical Routing Structure¶

Routers are aggregated into regions known as Autonomous Systems (AS) or domains. This creates two distinct levels of routing:

Intra-AS (Intra-domain): Routing within the same AS. All routers in the AS run the same protocol, though different ASes can use different protocols.
Inter-AS (Inter-domain): Routing among different ASes. Gateway routers at the edge of an AS handle these connections.
Forwarding Tables: A router's forwarding table is configured by a combination of both intra- and inter-AS algorithms.
- Intra-AS routing determine entries for destinations within AS
- Inter-AS & intra-AS determine entries for external destinations
Suppose a router in AS1 receives a datagram destined outside of AS1, the router should forward the packet to gateway router in AS1, but which one?

Inter-domain routing must:
- Learn which destinations reachable through AS2, which through AS3
- Propagate this reachability info to all routers in AS1

Intra-AS Routing Protocols¶

Common protocols for routing within a single network include:

RIP (Routing Information Protocol): A classic Distance Vector (DV) protocol; no longer widely used.
EIGRP (Enhanced Interior Gateway Routing Protocol): A DV-based protocol formerly proprietary to Cisco.
OSPF (Open Shortest Path First): A widely used Link-State protocol.

OSPF (Open Shortest Path First) Details¶

Mechanism: Uses Dijkstra’s algorithm to compute the forwarding table. Each router has a full map of the AS topology.
Advertisements: Routers flood link-state advertisements directly over IP (rather than TCP/UDP) to all other routers in the AS.
Security: All OSPF messages are authenticated to prevent malicious intrusions.

Hierarchical OSPF¶

For very large ASes, OSPF can be further subdivided into a two-level hierarchy:

Local Areas: Link-state advertisements are flooded only within the specific area. Local routers only know the detailed topology of their own area.
Backbone: Connects the various local areas.
Area Border Routers: Summarize distances to destinations in their own area and advertise them to the backbone.
Boundary Routers: Act as the gateway to other Autonomous Systems.

Routing among ISPs: BGP¶

Internet Inter-AS Routing: BGP¶

The Border Gateway Protocol (BGP) is the de facto standard inter-domain routing protocol, often described as the "glue that holds the Internet together." It allows an Autonomous System (AS) to advertise its existence and the specific destination prefixes it can reach to the rest of the Internet.

Core Functions of BGP¶

BGP provides each AS with the mechanisms to:

eBGP (External BGP): Obtain subnet reachability information from neighboring ASes.
iBGP (Internal BGP): Propagate that reachability information to all internal routers within the AS.
Route Selection: Determine "good" routes to other networks by balancing reachability information with local network policy.

BGP Sessions and Messages¶

BGP operates through sessions where two routers (referred to as peers) exchange routing information over a semi-permanent TCP connection.

Advertising paths to different destination network prefixes

BGP Message Types¶

OPEN: Establishes a TCP connection with a remote BGP peer and authenticates the sender.
UPDATE: The primary message for sharing information; it advertises a new path or withdraws a previously advertised path.
KEEPALIVE: Maintains the connection in the absence of updates and acknowledges OPEN requests.
NOTIFICATION: Reports errors in previous messages or is used to signal the closing of a connection.

BGP Path Attributes and Routes¶

BGP is a Path Vector protocol. When a router advertises a prefix (destination being advertised), it includes specific Attributes, making the combination a "route."

Key Attributes¶

AS-PATH: A list of ASes through which the prefix advertisement has passed. This helps prevent routing loops and is used in path selection.
- How to prevent routing loops? If an AS sees its own AS number in the path, it will immediately reject it.
NEXT-HOP: Indicates the specific internal-AS router interface that serves as the next hop to reach the next AS in the path.
- The next router you should send the packet to in order to eventually reach the destination prefix.

Policy-Based Routing¶

Unlike interior protocols that focus purely on the shortest path, BGP is driven by Policy:

Import Policy: A gateway can use local policies to decide whether to accept or decline a path (e.g., never routing traffic through a specific competitor's AS).
Export Policy: An AS policy determines whether it will advertise its own paths to specific neighboring ASes.

Route Propagation and Advertisement¶

The process of route advertisement follows a logical flow through eBGP and iBGP:

External Learning: A gateway router (e.g., AS3 router 3a) advertises a path to a prefix (X) to a neighbor (AS2 router 2c) via eBGP.
Internal Distribution: The receiving gateway (2c) accepts the path based on policy and propagates it to all other routers within its own AS via iBGP.
External Re-advertisement: Based on AS2's policy, another gateway (router 2a) may then advertise the modified path (now including AS2 in the AS-PATH) to its neighbor (AS1 router 1c) via eBGP.

Handling Multiple Paths¶

A gateway router may learn multiple paths to the same destination. For example, AS1 might learn a path to prefix X through AS2, and a shorter path directly from AS3. It will use its local policy to choose the best path and then distribute that choice internally.

Integration with Intra-AS Routing¶

BGP works in tandem with internal protocols like OSPF to ensure end-to-end delivery.

A router (e.g., 1d in AS1) learns via iBGP that the path to prefix X goes through exit gateway 1c.
The router then consults its Intra-domain routing table (generated by OSPF) to find the best internal path/interface to reach gateway 1c.
The forwarding table is updated so that traffic destined for X is sent to the interface leading to 1c.

Comparison: Intra-AS vs. Inter-AS Routing¶

The requirements for routing change significantly depending on whether the traffic is staying within an organization or crossing the global internet.

Policy

Inter-AS: Policy is the dominant factor. Admins need total control over how traffic is routed and who is allowed to transit through their network.
Intra-AS: Since there is a single administration, policy is less of a concern.

Scale

Inter-AS: Hierarchical routing is essential to manage the massive size of the global routing table and reduce the volume of update traffic.
Intra-AS: Scales to the size of a single organization, allowing for more detailed maps.

Performance

Inter-AS: Performance (speed/latency) is secondary to policy and business relationships.
Intra-AS: Can focus entirely on performance and optimizing for the fastest paths.

BGP Route Selection¶

When a router learns multiple paths to the same destination AS, it must select the best route. This decision-making process follows a prioritized hierarchy:

Local Preference Value Attribute: This is a policy-driven decision set by the network administrator. It is the highest priority and can override all other factors.
Shortest AS-PATH: If local preferences are equal, the router chooses the path that traverses the fewest number of Autonomous Systems.
Closest NEXT-HOP Router: Known as Hot Potato Routing, the router chooses the path with the lowest internal cost.
Additional Criteria: Used as tie-breakers if the above three are identical.

Hot Potato Routing¶

The philosophy behind hot potato routing is simple: get the packet out of your own network as quickly as possible.

Mechanism: A router (e.g., 2d in AS2) might learn via iBGP that it can reach destination X through two different exit gateways (2a or 2c).
Decision: The router chooses the gateway with the least intra-domain cost (based on OSPF link weights).
Result: In the example, 2d chooses gateway 2a because the internal cost is lower, even if choosing 2a results in a longer path through the rest of the Internet (more AS hops). The router prioritizes reducing its own resource usage over global path efficiency.
- Note that we don’t worry about inter-domain cost here. But that’s the idea of hot potato!

Achieving Policy via Advertisements¶

BGP allows ISPs to enforce "real-world" business policies by controlling how they advertise routes. A primary goal for many ISPs is to avoid carrying transit traffic (traffic that neither starts nor ends in their own customer network) for which they receive no revenue.

Provider and Customer Dynamics¶

ISP Transit Policy: An ISP only wants to route traffic to or from its direct customer networks.
Advertisement Control: If provider B learns a path to customer network w from provider A, B will choose not to advertise that path to provider C.
Consequence: Because C never learns about the path through B, C is forced to find another route (e.g., directly through A). B successfully avoids carrying traffic between A and C, which would have consumed B's resources for no financial gain.

Dual-Homed Customers¶

Policy also applies to customer networks that are connected to multiple providers (dual-homed).

Scenario: Customer network x is attached to both provider B and provider C.
Enforcement: x does not want to act as a bridge between its two providers. Therefore, x will not advertise a route to B that goes through C, and vice versa. This ensures that B and C cannot use the customer's private network to exchange their own ISP traffic.

Internet Control Message Protocol¶

ICMP: Internet Control Message Protocol¶

The Internet Control Message Protocol (ICMP) is used by hosts and routers to communicate network-layer information. While it is architecturally part of the Network Layer, ICMP messages are actually carried as payload within standard IP datagrams.

Although ICMP is logically part of the network layer, its messages are not sent independently; they are encapsulated inside normal IP packets and carried as the payload of IP datagrams.

Core Functions and Structure¶

Error Reporting: ICMP notifies the source of issues like unreachable hosts, networks, ports, or protocols.
Echo Services: Used by tools like ping to test reachability via request/reply messages.
Message Format: An ICMP message contains a Type, a Code, and the first 8 bytes of the original IP datagram that caused the error (to help the sender identify which packet failed).

Common ICMP Types and Codes¶

Type	Code	Description
0	0	Echo reply (ping)
3	0	Destination network unreachable
3	1	Destination host unreachable
3	3	Destination port unreachable
8	0	Echo request (ping)
11	0	TTL (Time to Live) expired
12	0	Bad IP header

Traceroute and ICMP¶

Traceroute is a diagnostic tool that uses ICMP to map the path between a source and a destination.

How Traceroute Works¶

UDP Probing: The source sends sets of UDP segments to the destination. Each set has an incrementing Time to Live (TTL) value (starting at TTL=1, then TTL=2, etc.).
Intermediate Router Response: When the \(n^{th}\) set of datagrams arrives at the \(n^{th}\) router, the TTL expires (reaches 0). The router discards the datagram and sends an ICMP "TTL expired" message (Type 11, Code 0) back to the source.
Recording: The ICMP message identifies the router’s name and IP address. The source records the time elapsed to calculate Round Trip Times (RTTs).

Stopping Criteria¶

The process continues until the probes actually reach the destination host.

Because the source uses an unlikely port number for the UDP segment, the destination host will return an ICMP "port unreachable" message (Type 3, Code 3).
Once the source receives this specific "port unreachable" message, it knows it has reached the final destination and stops sending probes.

Both OSPF and ICMP run directly on top of IP. There is no transport layer (no TCP/UDP)