OpenAI launches GPT-4o mini, which will replace GPT-3.5 in ChatGPT

Lower-cost AI language model will be free for ChatGPT users.

Benj Edwards – Jul 18, 2024 3:44 PM | 44

Credit: Benj Edwards

Performance

Predictably, OpenAI says that GPT-4o mini performs well on an array of benchmarks like MMLU (undergraduate level knowledge) and HumanEval (coding), but the problem is that those benchmarks don't actually mean much, and few measure anything useful when it comes to actually using the model in practice. That's because the feel of quality from the output of a model has more to do with style and structure at times than raw factual or mathematical capability. This kind of subjective "vibemarking" is one of the most frustrating things in the AI space right now.

A graph by OpenAI shows GPT-4o mini outperforming GPT-4 Turbo on eight cherry-picked benchmarks. Credit: OpenAI

So we'll tell you this: OpenAI says the new model outperformed last year's GPT-4 Turbo on the LMSYS Chatbot Arena leaderboard, which measures user ratings after having compared the model to another one at random. But even that metric isn't as useful as once hoped in the AI community, because people have been noticing that even though mini's big brother (GPT-4o) regularly outperforms GPT-4 Turbo on Chatbot Arena, it tends to produce dramatically less useful outputs in general (they tend to be long-winded, for example, or perform tasks you didn't ask it to do).

The value of smaller language models

OpenAI isn't the first company to release a smaller version of an existing language model. It's a common practice in the AI industry from vendors such as Meta, Google, and Anthropic. These smaller language models are designed to perform simpler tasks at a lower cost, such as making lists, summarizing, or suggesting words instead of performing deep analysis.

Smaller models are typically aimed at API users, which pay a fixed price per token input and output to use the models in their own applications, but in this case, offering GPT-4o mini for free as part of ChatGPT would ostensibly save money for OpenAI as well.

OpenAI’s head of API product, Olivier Godement, told Bloomberg, "In our mission to enable the bleeding edge, to build the most powerful, useful applications, we of course want to continue doing the frontier models, pushing the envelope here. But we also want to have the best small models out there."

Smaller large language models (LLMs) usually have fewer parameters than larger models. Parameters are numerical stores of value in a neural network that store learned information. Having fewer parameters means an LLM has a smaller neural network, which typically limits the depth of an AI model's ability to make sense of context. Larger-parameter models are typically "deeper thinkers" by virtue of the larger number of connections between concepts stored in those numerical parameters.

However, to complicate things, there isn't always a direct correlation between parameter size and capability. The quality of training data, the efficiency of the model architecture, and the training process itself also impact a model's performance, as we've seen in more capable small models like Microsoft Phi-3 recently.

Fewer parameters mean fewer calculations required to run the model, which means either less powerful (and less expensive) GPUs or fewer calculations on existing hardware are necessary, leading to cheaper energy bills and a lower end cost to the user.

Staff Picks

LordByronII

I haven't tried GPT-4o mini yet, but from my experience comparing GPT-4 to GPT-4o, while 4o is much faster, I've found that 4 generally gives better answers. 4o has a tendency to ramble on and on, provide unnecessary details, and provide overly generalized advice.

For example, asking it something like "My program is crashing with this exception: <exception message>" will often cause it to provide general advice about troubleshooting generic crash issues and sometimes I'll get multiple pages of text. I often have to remind it that I gave it the exact crash message and only then does it go back and actually answer the original question.

I hope that GPT-4o mini doesn't exhibit the same behavior.

July 18, 2024 at 4:13 pm

OpenAI launches GPT-4o mini, which will replace GPT-3.5 in ChatGPT

Ars Video

Performance

The value of smaller language models

nproxy.org