You've owned this ticket. The VPN's up. Pings are 18ms. Web works. Email works. Then someone tries to download a 50MB attachment and it hangs at 4%. The user calls again. "The internet is slow, but only on the VPN."
It's not bandwidth. It's not the WAN. It's not the user's WiFi. It's the headers. By the end of this article you'll have the math, the diagnosis, and the fix — in that order — and you'll never misdiagnose this class of problem again.
★ THE LIE THAT IS 1500 BYTES
1500 is the wire MTU of standard Ethernet. That's it. It's not a guarantee, it's not a target, and it absolutely doesn't survive a tunnel.
Inside that 1500 lives 14 bytes Ethernet + 20 bytes IPv4 + 20 bytes TCP, leaving 1460 bytes of TCP payload — that's your TCP MSS on a vanilla LAN. Fine on the LAN. Not fine the moment you tunnel.
The myth: "we have GbE everywhere, 1500 is universal, MTU isn't a 2026 problem." The reality: every tunnel header has to fit inside the next-hop MTU, which is still 1500. Something has to give. Either the inner payload shrinks, or your packets fragment, or your packets vanish silently. Spoiler: in 2026, they vanish silently.
Every layer eats bytes. Every byte eaten comes off the inner TCP MSS. You can't grow 1500 bytes — you can only divide them up differently.
★ WHAT'S ACTUALLY EATING YOUR PAYLOAD
This is the section to bookmark. Real overhead numbers for the tunnel types you'll meet in the wild.
| TUNNEL TYPE | OVERHEAD | EFFECTIVE MTU | TCP MSS TO CLAMP |
|---|---|---|---|
| IPsec ESP tunnel mode (AES-GCM) | ~62 B | 1438 | 1398 |
| IPsec ESP + NAT-T (UDP 4500) | ~78 B | 1422 | 1382 |
| GRE | 24 B | 1476 | 1436 |
| GRE over IPsec | ~86 B | 1414 | 1374 |
| WireGuard (UDP) | 60 B | 1440 | 1400 |
| OpenVPN UDP (AES-GCM) | ~50 B | 1450 | 1410 |
| FortiClient SSL-VPN (TLS/TCP) | TCP-in-TCP | tunnel-mtu 1273 | use FGT setting |
| PPPoE (residential underlay) | −8 B | subtract 8 | subtract 8 |
The point isn't to memorize a magic number. It's to calculate yours. A remote worker on residential PPPoE connecting through IPsec NAT-T isn't getting 1500. They're getting 1422 minus 8 = 1414 effective MTU, and an MSS budget of 1374. If you've got that user clamped to a generic "1380" you saw on a vendor doc, they're still losing two bytes per segment. Multiply by every connection on the tunnel and that's a measurable performance tax.
That's 8 bytes ICMP header + 20 bytes IP header. Memorize it. You will use it weekly.
★ WHY PMTUD IS SUPPOSED TO SAVE YOU
Path MTU Discovery is the protocol that's supposed to make this entire article unnecessary. Here's how it's supposed to work:
- Sender sets the DF (Don't Fragment) bit on every packet.
- Some router along the path has a smaller MTU than the packet.
- That router drops the packet and returns an ICMP Type 3 Code 4 — "Fragmentation Needed and DF Set" — back to the sender.
- Sender sees the ICMP, lowers its effective MTU for that destination, retransmits.
- Connection continues at the smaller size.
Elegant. Self-healing. End-to-end. Designed by people who knew what they were doing.
And dead. PMTUD is the protocol your security team killed.
2. Asymmetric routing. The ICMP unreachable comes back via a different path that drops it before it reaches the original sender.
3. NAT devices that don't translate the embedded inner IP header inside the ICMP payload — sender sees the ICMP and can't match it to any session, so ignores it.
4. ECMP / anycast. The ICMP error originates from a node the sender never targeted, so the response gets discarded as unsolicited.
5. Sender stacks that give up silently. Linux ships with
tcp_mtu_probing=0 by default — meaning the kernel doesn't even try black-hole detection.
End result: the sender keeps blasting 1500-byte packets with DF=1 into a tunnel that can only carry 1422. The router drops them. The ICMP that's supposed to tell the sender gets blocked, dropped, or ignored. The application stalls. There is no error. There is no retransmission feedback. Just a black hole.
If you ever capture a stalled VPN connection in Wireshark, you'll see this exact pattern: SYN and SYN-ACK negotiate MSS=1460, the first 1500-byte data segment goes out with DF set, and then... silence. The TCP retransmit timer fires, sends the same packet, gets the same silence. The user sees a hung browser. You see a crime scene.
★ THE ASYMMETRY TRAP
Here's the part that makes this maddening to diagnose if you don't already know what to look for.
TCP traffic is rarely symmetric. Server sends data → client; client sends ACKs → server. The data direction has full-size segments (1500 bytes). The ACK direction has tiny ones (~50 bytes). Only the big direction hits the MTU ceiling.
Client → server small ACKs: well under the limit, fly through fine.
Result: "Downloads hang. Uploads are fine." Or vice versa, depending on direction. Or weirder: login pages work (small response), file fetches hang (big response). Teams chat works. Teams uploads hang. M365 web app loads. M365 attachment download dies at 4%.
This is why tickets get misdiagnosed for weeks. Ops blames the firewall, then the WAN, then the user's WiFi, then antivirus, then the laptop. Nobody ever blames the MTU because half the traffic is working perfectly and that doesn't fit anyone's mental model of "broken."
★ THE TEST THAT ACTUALLY PROVES IT
The ping-with-DF test, properly explained. Most blog posts get this wrong by skipping the math.
Why 1472? Because 1472 payload + 8 ICMP header + 20 IP header = 1500 — exactly fills standard Ethernet MTU. If 1472 succeeds, your path MTU is 1500. If it fails, walk the size down: 1430, 1400, 1380, 1360. The largest size that succeeds + 28 = your real path MTU. Subtract 40 (TCP+IP) to get your real-world TCP MSS.
One caveat that catches people: this tests ICMP. Some paths treat ICMP differently than TCP — load balancers, CGNATs, and DPI middleboxes can ignore TCP MTU constraints they enforce on ICMP. So always confirm with a real TCP test:
★ THE FIX, IN ORDER OF WHERE TO APPLY IT
"Set MSS to 1360 and pray" is not a fix. Here's the actual decision tree, in priority order.
1. MSS CLAMPING AT THE TUNNEL INGRESS — RIGHT 90% OF THE TIME
This is the answer for TCP traffic. The tunnel-terminating device rewrites the MSS option in every TCP SYN that crosses it, forcing both endpoints to negotiate a smaller segment size that fits inside the tunnel. End hosts never know — they just stop sending oversized segments.
2. LOWER THE TUNNEL MTU ITSELF — REQUIRED FOR UDP / QUIC
MSS clamping only touches TCP SYNs. It does nothing for UDP traffic, which means it does nothing for QUIC, VoIP, large DNS responses, IPsec inside another tunnel, or anything else that doesn't use TCP. For those, you have to lower the tunnel interface MTU itself so the OS fragments at the right size.
Trade-off: every packet pays the tax, even ones that didn't need to. But it works for everything, not just TCP.
3. ALLOW ICMP TYPE 3 CODE 4 END-TO-END — THE CORRECT FIX NOBODY DOES
Whitelist ICMP Type 3 Code 4 on every firewall in the path. PMTUD comes back to life. The protocol works as designed.
Why nobody does it: the security team won't approve it because "ICMP is bad." The 1990s threat model that produced that policy is two decades old, but it's load-bearing in every enterprise security policy template. So we clamp MSS and move on. Hard truth.
4. ENABLE BLACK-HOLE DETECTION ON THE HOST — LAST RESORT
The host detects "I keep retransmitting and getting nothing" and probabilistically lowers MSS until traffic flows. Slow, ugly, but works on hostile networks where you can't fix the path.
Don't set MTU=1280 globally "to be safe." That's the IPv6 minimum and a reasonable last-resort tunnel MTU, but globally it costs you ~15% throughput on every link that didn't need it.
Don't switch protocols and call it fixed. Swapping IPsec for L2TP, OpenVPN, or WireGuard doesn't make MTU go away. The headers are different sizes but the principle is identical. Same problem, different costume.
★ THE FORTINET-SPECIFIC TRAP
If you're on FortiGate — and a fair chunk of you reading this are — there are a few traps worth knowing about.
First: FortiGate's IPsec MSS clamping is configured per firewall policy, not per tunnel interface. That means if you've got three policies all referencing the same IPsec tunnel and you only set the MSS on one of them, traffic crossing the other two policies goes unclamped. Easy to miss when cloning policies. If you're running a FortiGate at home, this'll bite you the first time you split your policies for different VLANs.
Second: FortiClient SSL-VPN has its own tunnel MTU, hidden under config vpn ssl settings as tunnel-mtu. Default is 1273. Yes, 1273. That's because SSL-VPN runs TCP-inside-TCP-inside-TLS, and Fortinet's default value is conservative for hostile transit networks. If your remote users complain about VPN being "slower than the internet itself," the default tunnel-mtu is one of three places to start.
Third: roaming users. Hotel WiFi, captive portals, and cellular tethers all do their own MTU shenanigans, often without telling you. FortiClient's tunnel MTU is set per the gateway, not adapted to the underlying network. Lower it deliberately for any tunnel that has roaming users on it — 1300 is a reasonable target.
MTU 1422 / MSS 1380.
With residential PPPoE underneath: MTU 1414 / MSS 1372.
For FortiClient SSL-VPN with roaming users: tunnel-mtu 1300.
Useful diagnostics on FortiGate that are worth keeping in your back pocket:
★ UDP DOESN'T CARE — AND QUIC IS COMING
Here's the part of this article that ages everyone else's blog posts on this topic: MSS clamping is a TCP fix. It rewrites the MSS option in TCP SYN packets. UDP has no equivalent. The fix is invisible to UDP traffic.
For most of internet history that was fine. UDP was DNS, NTP, and a few games. None of those sent MTU-sized packets. The TCP-only fix was effectively the whole fix.
Then came QUIC. HTTP/3 runs over UDP. Google services run over QUIC. M365 increasingly runs over QUIC. Cloudflare runs over QUIC. By 2026, a meaningful percentage of any browser session is UDP packets riding on top of QUIC, sometimes with full-MTU datagrams.
Your TCP MSS clamping does nothing for any of it. The problem you "fixed" last quarter regresses the day Chrome upgrades the user. The right answer in 2026 is increasingly option 2 from the fix list — lower the tunnel MTU itself, so fragmentation works across all protocols. The TCP-only fix is a relic.
Workaround that some firewalls implement: block UDP/443 outbound. Chrome falls back to TCP, MSS clamping works again. It's ugly, it costs you the QUIC performance benefit, but it makes the symptom stop while you do the proper fix. Don't ship this as your answer — ship it as the bridge while you lower tunnel MTU correctly.
★ HOW TO VERIFY YOU ACTUALLY FIXED IT
Don't ship a fix you haven't proven. Three quick verification steps:
- Capture a fresh TCP handshake post-fix in Wireshark. Look at the SYN. The MSS option should now be your clamped value (e.g. 1380), not 1460. If it's still 1460, your clamp isn't taking effect.
- Run a real-world large-file test. Pull a known 100MB+ file across the tunnel. Throughput should stay linear instead of stalling at packet boundaries. If it stalls, you've fixed TCP but not UDP — go back to fix #2.
mtr --tcpfrom the client through the tunnel. Watch where loss begins (or doesn't, post-fix). The hop pattern tells you whether the fix is at the right node in the path.
★ THE TAKEAWAY
The network was working all along. The protocol that's supposed to find this for you (PMTUD) has been quietly broken for two decades because every firewall admin overpruned ICMP. Until that changes industry-wide — and it won't — MSS clamping at every tunnel ingress is non-negotiable infrastructure hygiene, not optional tuning. In real enterprise networks, this is one of the first things any senior network engineer checks when "the VPN is slow" comes across the wire.
Bookmark the math table. Memorize the 28-byte trick. Understand the asymmetry trap so you can diagnose it on the first ticket instead of the fifth. And start planning the move from MSS clamping (TCP-only) to tunnel-MTU lowering (universal), because QUIC isn't coming — it's already here.
One line to take with you:
If your VPN seems to work but doesn't, the answer is almost always the headers._