Writing — Dustin Deus

May 28, 2026

AI infrastructure

Why more GPUs is not enough for LLM inference

What I learned deploying and tuning large-model inference: KV cache, routing, and cache hierarchy matter as much as raw GPU count.