Sakana AI Unveils TreeQuest: Transforming Multi-Model Collaboration in AI

In a groundbreaking development, Sakana AI, a leading Japanese artificial intelligence laboratory, has introduced a novel technique known as Multi-LLM AB-MCTS, which orchestrates multiple large language models (LLMs) to collaboratively tackle complex tasks. This innovative approach is designed to enhance performance, enabling model teams to outperform individual LLMs by 30%. The announcement was made during a presentation on July 3, 2025, which highlighted the significance of collective intelligence in AI development.

The Multi-LLM AB-MCTS technique employs the principles of Monte Carlo Tree Search (MCTS) to facilitate a dynamic and strategic collaboration among various LLMs. According to Takuya Akiba, a research scientist at Sakana AI and co-author of the study, this new framework enables AI models to perform trial-and-error more effectively by balancing two fundamental search strategies: "searching deeper"—which involves refining existing solutions—and "searching wider"—which entails generating new solutions. This dual approach allows the system to pivot based on the evolving complexity of the tasks at hand.

The use of inference-time scaling, or test-time scaling, is pivotal in this technique, differing from traditional training-time scaling that focuses on increasing model size and dataset volume. By allocating additional computational resources post-training, the Multi-LLM AB-MCTS framework enhances the performance of LLMs, allowing them to generate longer, more detailed responses and solutions. This method has garnered attention in the AI community for its potential to develop more robust AI systems that can dynamically leverage the strengths of different models.

Sakana AI's implementation of Multi-LLM AB-MCTS demonstrates a significant advancement in the field of AI. The collective intelligence model was tested against the ARC-AGI-2 benchmark, which evaluates human-like visual reasoning capabilities. The results were promising, with the ensemble of models achieving correct solutions for over 30% of the 120 test problems, a notable improvement compared to the performance of individual models. This collective approach not only identifies the most effective LLMs for specific tasks but also adapts during the problem-solving process, enhancing overall efficiency.

The innovative nature of this technique lies in its ability to mitigate common issues such as hallucination, where AI models produce incorrect or nonsensical outputs. By leveraging an ensemble of models with varying strengths, including those less prone to error, the Multi-LLM AB-MCTS framework aims to provide a balanced solution that retains powerful logical capabilities while ensuring groundedness, a key consideration for enterprise applications.

Looking ahead, Sakana AI has made the underlying algorithm available as an open-source framework named TreeQuest, which is designed to be implemented across various business contexts. The framework offers a flexible API that enables developers to customize scoring and logic according to their specific needs. As Akiba noted, while the application of AB-MCTS is still in its early stages, initial findings indicate substantial potential across numerous domains, including algorithmic coding and optimizing machine learning model performance.

The implications of this technology extend beyond academic interest; it presents significant opportunities for enterprises seeking to enhance their AI capabilities. By harnessing the power of collaborative intelligence among LLMs, businesses can address complex challenges more effectively, paving the way for a new generation of AI applications that promise to be both powerful and reliable. The release of TreeQuest could indeed catalyze a transformative shift in how AI systems are developed and deployed in real-world scenarios.