[OPEN SOURCE / EDGE]2026.05.219 min read

Why We Rewrote Our Auth Middleware in WASM (And You Probably Should Too)

A 600-line Rust crate, compiled to a 47KB WebAssembly module, now handles authentication for every request across 280 edge locations. Here is the architecture.

Marcus Kerr

Staff Engineer · InceptsLab

Cold starts are the silent tax of edge compute. Every millisecond spent bootstrapping a runtime is a millisecond not spent serving the user — and at the edge, where you might cold-start hundreds of times per second across PoPs, that tax compounds.

We rewrote our token validation, claim extraction, and rate-limit middleware as a single Rust crate compiled to WASM. The binary is 47KB after wasm-opt. Cold start is 0.8ms on Cloudflare Workers, 1.1ms on Fastly Compute.

The architecture is intentionally boring. One exported function: `verify(request) -> Decision`. The decision is a tagged union: Allow, Deny(reason), or Challenge(captcha_token). The host language wires it up with three lines.

Key insight: WASM modules are pure. They can be cached aggressively, version-pinned, and rolled back atomically. We deploy a new middleware version in 14 seconds globally. The previous PHP-based system took 11 minutes.

Production deployment

We deployed this gradually behind an internal feature flag, mirroring 1% of traffic for 72 hours before promoting. The instrumentation surface is shipped as part of our open-source edge-trace crate.

// Pseudocode — the actual wiring lives in the repo
const router = createRouter({
  classify: classifier.predict,
  speculate: speculator.draft,
  verify: verifier.confirm,
  windowSize: 8,
});

export default router.handle;

What we got wrong

Our first iteration over-trusted the speculator on long sequences. The fix was a sliding acceptance threshold that decays with prefix length — obvious in hindsight, not obvious during the on-call that surfaced it.

The bottleneck is rarely where you think it is. Measure first; optimize the thing that actually moves the bill.