// TAG · AI

The lab.

Deep dives, project stories, and engineering notes - anything we build that's worth writing about.

List View Interactive Map

// FILTER · CATEGORY

All 3 deepdives 11 programming 10 embedded 4 ai 3 projects 2 meta 1 svelte 1

Training-Free Vector Quantization: Lloyd-Max Codebooks on the Unit Sphere

deepdives

May 28, 2026 · 10 min

Training-Free Vector Quantization: Lloyd-Max Codebooks on the Unit Sphere

How KAI compresses 6 KB float32 vectors into ~768 bytes - without ever looking at your data - using the geometry of high-dimensional spheres and optimal scalar quantization theory.

Read

Inside the SIMD Search Pipeline: How KAI Scans Millions of Vectors in Milliseconds

deepdives

May 27, 2026 · 9 min

Inside the SIMD Search Pipeline: How KAI Scans Millions of Vectors in Milliseconds

A deep dive into nibble-split lookup tables, blocked memory layouts, and architecture-specific SIMD kernels - the techniques that make sub-millisecond vector search possible.

Read

Mar 15, 2026 · 8 min

LLM inference in production: latency, caching, and the hidden costs

Running language models in production is an engineering problem long before it's a product problem. What we've learned about streaming, prompt caching, batching, the self-host-versus-API decision, and the costs nobody puts on the slide.

Read