Issue 042 · Live from the edge

Engineering at the Edge.

Long-form research, applied AI tooling, and open-source middleware from a collective of staff engineers shipping at the frontier. Published when there is something worth publishing.

42 deep-dives14 OSS projects9 contributing engineersUpdated 2026.05.30

Featured Intelligence

View all →

[AI / LLM ROUTING]

Speculative Decoding at the Gateway: Saving 41% on Inference Spend

We rewired our model router to draft tokens with a 1.3B speculator before the 70B verifier confirms. The economics changed overnight — and so did our P99 latency.

2026.05.2812 min

[OPEN SOURCE / EDGE]

Why We Rewrote Our Auth Middleware in WASM (And You Probably Should Too)

A 600-line Rust crate, compiled to a 47KB WebAssembly module, now handles authentication for every request across 280 edge locations. Here is the architecture.

2026.05.219 min

[INFRASTRUCTURE / GPU]

KV-Cache Paging: How We Fit 3x More Concurrent Requests Per H100

PagedAttention is not just an optimization — it is a memory allocator for transformers. Here is what we learned shipping it to production.

2026.05.1515 min

[TOOLING / DSL]

A Typed Prompt Compiler: Catching Hallucination Bugs at Build Time

We built a TypeScript-flavored DSL for prompts that statically verifies retrieved-context shape against expected JSON output. Bugs that used to ship to prod now fail CI.

2026.05.0911 min

[FULL-STACK / RSC]