Vivid Headlines

Sakana AI's CycleQD outperforms traditional fine-tuning methods for multi-skill language models - RocketNews


Sakana AI's CycleQD outperforms traditional fine-tuning methods for multi-skill language models - RocketNews

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Researchers at Sakana AI have developed a resource-efficient framework that can create hundreds of language models specializing in different tasks. Called CycleQD, the technique uses evolutionary algorithms to combine the skills of different models without the need for expensive and slow training processes.

CycleQD can create swarms of task-specific agents that offer a more sustainable alternative to the current paradigm of increasing model size.

Rethinking model training

Large language models (LLMs) have shown remarkable capabilities in various tasks. However, training LLMs to master multiple skills remains a challenge. When fine-tuning models, engineers must balance data from different skills and ensure that one skill doesn't dominate the others. Current approaches often involve training ever-larger models, which leads to increasing computational demands and resource requirements.

"We believe rather than aiming to develop a single large model to perform well on all tasks, population-based approaches to evolve a diverse swarm of niche models may offer an alternative, more sustainable path to scaling up the development of AI agents with advanced capabilities," the Sakana researchers write in a blog post.

To create populations of models, the researchers took inspiration from quality diversity (QD), an evolutionary computing paradigm that focuses on discovering a diverse set of solutions from an initial population sample. QD aims at creating specimens with various "behavior characteristics" (BCs), which represent different skill domains. It achieves this through evolutionary algorithms (EA) that select parent examples and use crossover and mutation operations to create new samples.

CycleQD incorporates QD into the post-training pipeline of LLMs to help them learn new, complex skills. CycleQD is useful when you have multiple small models that have been fine-tuned for very specific skills, such as coding or performing database and operating system operations, and you want to create new variants that have different combinations of those skills.

In the CycleQD framework, each of these skills is considered a behavior characteristic or a quality that the next generation of models is optimized for. In each generation, the algorithm focuses on one specific skill as its quality metric while using the other skills as BCs.

"This ensures every skill gets its moment in the spotlight, allowing the LLMs to grow more balanced and capable overall," the researchers explain.

Previous articleNext article

POPULAR CATEGORY

entertainment

12596

discovery

5687

multipurpose

13231

athletics

13113