// TAG · AI
The lab.
Deep dives, project stories, and engineering notes - anything we build that's worth writing about.
// FILTER · CATEGORY

deepdives
· 10 min
Training-Free Vector Quantization: Lloyd-Max Codebooks on the Unit Sphere
How KAI compresses 6 KB float32 vectors into ~768 bytes - without ever looking at your data - using the geometry of high-dimensional spheres and optimal scalar quantization theory.
Read

deepdives
· 9 min
Inside the SIMD Search Pipeline: How KAI Scans Millions of Vectors in Milliseconds
A deep dive into nibble-split lookup tables, blocked memory layouts, and architecture-specific SIMD kernels - the techniques that make sub-millisecond vector search possible.
Read

ai
· 8 min
LLM inference in production: latency, caching, and the hidden costs
Running language models in production is an engineering problem long before it's a product problem. What we've learned about streaming, prompt caching, batching, the self-host-versus-API decision, and the costs nobody puts on the slide.
Read