Why We Rewrote Our Auth Middleware in WASM (And You Probably Should Too)
A 600-line Rust crate, compiled to a 47KB WebAssembly module, now handles authentication for every request across 280 edge locations. Here is the architecture.
Cold starts are the silent tax of edge compute. Every millisecond spent bootstrapping a runtime is a millisecond not spent serving the user — and at the edge, where you might cold-start hundreds of times per second across PoPs, that tax compounds.
We rewrote our token validation, claim extraction, and rate-limit middleware as a single Rust crate compiled to WASM. The binary is 47KB after wasm-opt. Cold start is 0.8ms on Cloudflare Workers, 1.1ms on Fastly Compute.
The architecture is intentionally boring. One exported function: `verify(request) -> Decision`. The decision is a tagged union: Allow, Deny(reason), or Challenge(captcha_token). The host language wires it up with three lines.
Key insight: WASM modules are pure. They can be cached aggressively, version-pinned, and rolled back atomically. We deploy a new middleware version in 14 seconds globally. The previous PHP-based system took 11 minutes.
Production deployment
We deployed this gradually behind an internal feature flag, mirroring 1% of traffic for 72 hours before promoting. The instrumentation surface is shipped as part of our open-source edge-trace crate.
// Pseudocode — the actual wiring lives in the repo
const router = createRouter({
classify: classifier.predict,
speculate: speculator.draft,
verify: verifier.confirm,
windowSize: 8,
});
export default router.handle;What we got wrong
Our first iteration over-trusted the speculator on long sequences. The fix was a sliding acceptance threshold that decays with prefix length — obvious in hindsight, not obvious during the on-call that surfaced it.
The bottleneck is rarely where you think it is. Measure first; optimize the thing that actually moves the bill.