OpenAI Halves Its Inference Costs With Software Alone

01 Jul 202609:45

Project update

OpenAI Halves Its Inference Costs With Software Alone

OpenAI engineers found a software optimization that cuts the cost of running its models by more than 50% — squeezing far more from existing GPUs, with no new hardware.

Applied to logged-out ChatGPT, it now runs on just a couple hundred Nvidia GPUs — a fraction of the compute once needed.

The savings give OpenAI room to widen margins, raise ChatGPT usage limits, or ease API pricing, per The Information.