Why QoS is needed
Without QoS, a network treats all packets identically — a large file transfer competes equally with a VoIP call. When a link is congested, the router drops or delays packets based solely on arrival order (tail drop). For bulk data this is fine — TCP retransmits. For voice, a 150 ms one-way delay budget means a dropped or late packet is simply gone; the conversation degrades.
QoS solves this by classifying traffic, marking it with a priority indicator, and then treating high-priority traffic preferentially at each network device — providing more bandwidth, lower queuing delay, and earlier service.
The three QoS models: Best-effort (no QoS — all packets treated equally), IntServ (Integrated Services — RSVP signals per-flow reservations end-to-end; complex, doesn't scale), DiffServ (Differentiated Services — traffic is marked with DSCP per class at the network edge; each device applies PHB based on the marking; scalable and the dominant model in use today).
Classification and marking: DSCP and CoS
Before a packet can be prioritized, it must be classified — identified as voice, video, signaling, or data — and marked with a value that downstream devices can read.
DSCP (Differentiated Services Code Point) is the 6-bit field in the IP header's ToS byte used to mark packets in DiffServ. DSCP values range 0–63. Key values: CS0 (0) = best-effort, AF11-AF43 = Assured Forwarding classes, EF (46) = Expedited Forwarding for voice, CS6 (48) = network control traffic.
Expedited Forwarding (EF, DSCP 46) is the marking for VoIP RTP streams. It guarantees the lowest latency and jitter treatment at each hop. Assured Forwarding (AF) classes provide guaranteed minimum bandwidth with drop precedence — AF41 is higher priority than AF43 within the same class.
CoS (Class of Service) is the 3-bit field in an 802.1Q VLAN tag used to mark Layer 2 Ethernet frames. CoS 5 = voice, CoS 3 = call signaling, CoS 0 = best-effort. CoS is only preserved when a trunk carries the VLAN tag; it's lost when frames are forwarded over access ports or Layer 3 hops. DSCP persists across Layer 3 boundaries.
Trust boundaries
A trust boundary is the point in the network where QoS markings from upstream devices are either trusted or re-marked. Devices inside the trust boundary have their DSCP/CoS markings honored by downstream network devices. Devices outside have their markings ignored or overwritten.
The Cisco best-practice trust boundary: trust the IP phone's DSCP markings but not the PC connected through the phone. The switch port connected to a Cisco IP phone can be configured to trust CoS from the phone but re-mark (or ignore) CoS from the PC. This prevents users from marking their own traffic as EF to gain priority.
On access layer switches: `mls qos trust dscp` on ports connecting to phones and servers that are authorized to mark traffic. On ports connecting to untrusted endpoints (user PCs), the switch re-marks all traffic to CS0 (DSCP 0) regardless of what the PC sends.
The trust boundary design matters because every device in the QoS path applies its PHB based on the DSCP marking. If a PC marks all its traffic as EF, it would receive voice-class treatment throughout the network — defeating the purpose of QoS entirely.
Per-Hop Behavior (PHB) and queuing
Per-Hop Behavior (PHB) is the forwarding treatment applied to a traffic class at each router or switch based on its DSCP marking. The key PHBs defined in DiffServ: Default PHB (CS0) = best-effort FIFO. Expedited Forwarding PHB (EF) = low latency, low jitter, low loss; served before all other queues. Assured Forwarding PHB (AF) = guaranteed minimum bandwidth with configurable drop precedence.
Queuing mechanisms implement PHB in hardware. FIFO (First In, First Out) is the default — no prioritization. Priority Queuing (PQ) has four queues (High, Medium, Normal, Low); the high-priority queue is always serviced first, risking starvation of lower queues. Weighted Fair Queuing (WFQ) divides bandwidth proportionally among flows. Class-Based WFQ (CBWFQ) extends WFQ with configurable bandwidth guarantees per traffic class. Low Latency Queuing (LLQ) adds a strict priority queue to CBWFQ — voice goes to the strict priority queue for guaranteed service while other classes share CBWFQ guarantees.
LLQ is the recommended queuing mechanism for networks carrying voice. The strict priority queue (SPQ) serves voice traffic before anything else, ensuring the sub-10 ms one-way queuing delay that VoIP requires.
Policing and shaping
Policing and shaping both enforce a traffic rate limit, but they handle excess traffic differently.
Policing drops (or re-marks) packets that exceed the configured rate immediately. It's applied at ingress (incoming) to limit traffic entering the network. Policing is abrupt — it causes TCP retransmissions and can cause audio/video glitches if applied to real-time traffic. ISPs use policing at their edge to enforce customer rate limits.
Shaping buffers excess packets and transmits them when the rate falls below the limit, smoothing out bursts. Shaping is applied at egress (outgoing) and is more TCP-friendly because it delays rather than drops. The trade-off: shaping introduces latency (buffering delay) and requires memory for the buffer queue.
For CCNA: policing = drop excess (ingress, no buffering). Shaping = delay excess (egress, buffering). Use policing to limit inbound traffic; use shaping to smooth outbound traffic to match a downstream rate limit.