If the total exceeds the window, something must be truncated or summarized. 2.4 Latency & model choice why bigger is not always better Two latency numbers matter: time-to-first-token (TTFT) and tokens ...