Bigger Isn’t Better, Or Why FLLMs Matter

18Jun

Bigger Isn’t Better, Or Why FLLMs Matter

In October 2023, we argued that the future of AI would be in “focused large language models” (FLLMs). These are purpose-built language models that target a specific industry, set of languages, or task and that are correspondingly smaller than the large language models (LLMs) being created by OpenAI, Google, Meta, and others.

Those massive models – GPT-4o has over 175 billion parameters – are like Swiss Army knives: They are prepared to handle almost any task, from creating a haiku to drawing pictures of cats. How well they do these tasks is another question. And their power comes at a significant cost in terms of resources. When figures like the aptly named Leopold Aschenbrenner – his last name means “ash burner” – call for hundreds of new power plants to power the AI future, all while we are facing calls to reduce carbon output, their vision might literally involve boiling the oceans and turning the earth to ash.

But are these capabilities and costs needed for AI to be useful? I would argue that they are not. Using GPT-4o (or a future GPT-∞) to update the formality of a German translation, write a marketing release in Hindi, or revise a translation into Albanian is a bit like taking your US$1.9 million Fenyr SuperSport down to the corner grocery store to buy some potatoes for your supper: You can do it, but you would probably be better off walking or taking a bicycle.

So it is with using LLMs. These large models are trained to answer and do anything and everything, and that means typical requests just touch a tiny amount of their capability. Even worse, because they are so big, they may have been trained on the data they need for a task, but it can get lost and overwhelmed by irrelevant materials. This lesson has now been borne out by Microsoft researchers with their Phi-3 series of models, which they describe as “small.” These models are designed to run on limited hardware – potentially including on-device in cars, cameras, and sensors with low or no bandwidth – but have outperformed much larger models on the Massive Multitask Language Understanding (MMLU) benchmark. The goal is not to replace LLMs, but to provide an array of smaller options for particular purposes.

In this, they very much resemble the FLLMs we called for. FLLMs would be orders of magnitude smaller than the all-purpose LLMs because they would not need all of the parameters irrelevant to their tasks. They could thus be much more efficient and, for a given hardware configuration, much faster. In this sense, they would be much more like neural MT, but would retain the flexibility that prompt mechanisms give to LLMs today. They might not be able to create a (bizarre) sci-fi reinterpretation of Van Gogh’s Starry Night or compose a (truly awful) sonnet in honor of King Charles, but for typical translation-related tasks, they could perform as well (or better) than today’s state of the art LLMs.

For the time being, approaches to improve LLMs based on LoRA and RAG are effective in constraining their output, but they do not solve the resource problem. These same methods will continue to apply even with a shift to FLLMs, because they – particularly RAG – offer the same advantages, if not more so, for the smaller models. With companies such as Unbabel and Smartling promoting their own LLM approaches, it seems only a matter of time before commercial translation-oriented FLLMs become a common offering.

However, it is still beyond the capability for most enterprises or LSPs to build their own FLLMs. Nevertheless, Microsoft’s success with Phi-3 demonstrates the potential this approach offers. Increasing numbers of LLMs now list translation as a feature and the need and opportunity for more efficient and targeted FLLMs will only grow over time. FLLMs may not be ubiquitous yet, but their advantages in terms of ecological benefit, speed, and suitability will push more and more applications their way. And along the way they will achieve the benefits of responsive machine translation.