writing Notes on infrastructure, shipping software, and startups. May 28, 2026 Why more GPUs is not enough for LLM inference What I learned deploying and tuning large-model inference: KV cache, routing, and cache hierarchy matter as much as raw GPU count. →