Source-led article
Hugging Face Hackathon Explores Multi-Model Agent Systems with Small Models for Complex Simulations

Hugging Face’s “Build Small Hackathon” recently showcased a novel approach to multi-agent systems, demonstrating how diverse small language models (SLMs) from different research labs can be integrated to create sophisticated simulations. The project, dubbed “Thousand Token Wood v2,” evolved from a simple sandbox into an interactive finance drama, challenging players to manipulate an emergent economy powered by distinct AI agents. This development offers valuable insights for Indian AI developers and researchers exploring efficient and specialized AI applications.
The core innovation lies in assigning each AI agent within the simulation a different small model from various labs. This contrasts with the common practice of using a single model with multiple prompts for agent interactions. For Indian teams, this could mean more specialized and cost-effective AI solutions, potentially reducing reliance on large, monolithic models for complex tasks.
Key facts:
| Feature | Description |
|---|---|
| Project Name | Thousand Token Wood v2 |
| Core Concept | Multi-model agent simulation for an emergent economy |
| Models Used | gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), fine-tuned Qwen 0.5B |
| Player Role | Shadow financier influencing agents |
The Multi-Model Advantage
The “Thousand Token Wood v2” simulation features five distinct agents, each powered by a different small model: an OpenAI gpt-oss-20b, OpenBMB’s MiniCPM3-4B, NVIDIA’s Nemotron-Mini-4B, and a custom fine-tuned Qwen 0.5B. This heterogeneity is crucial, allowing for genuinely diverse participant behaviors within the simulated market. As the project report notes, “A market is interesting when the participants genuinely differ, and four labs’ models trained on different data with different post-training are about as different as small models get.” This approach suggests that for Indian startups and developers building agent-based systems, leveraging a mix of specialized SLMs could lead to more dynamic and realistic outcomes compared to a uniform model architecture.
Engineering for Diversity
A significant technical hurdle in integrating multiple models is the “serving layer.” The hackathon team found that the main friction point was not in the models themselves but in ensuring their outputs could be reliably processed. They developed a tolerant JSON parse-and-repair layer that every model’s output flows through. This layer handles varying tokenizers and formatting quirks, preventing simulation crashes. For Indian enterprises adopting or developing multi-model AI systems, this highlights the importance of robust data parsing and integration layers, which can significantly reduce development time and improve system stability. Building such a layer once can make adding new models a configuration task rather than a major refactor.
Security and Secret Information
A critical aspect of “Thousand Token Wood v2” is the player’s ability to whisper insider tips to agents. To make this a genuine game mechanic, the truthfulness of a tip must be hidden from the agents. This is achieved by storing the “truth flag” off-prompt and stripping it from public records. A dedicated security test scans every agent’s full prompt each turn for banned tokens, ensuring that secret information does not leak. This has strong implications for Indian businesses creating AI agents that handle sensitive data or operate in scenarios requiring information asymmetry. The lesson is clear: for secret information, a firewall must be built into the data flow and rigorously tested, not merely relied upon as a prompt instruction. This is crucial for applications in finance, legal tech, or other sensitive domains.
Managing Agent Memory and Relationships
The agents in “Thousand Token Wood v2” maintain persistent relationships with the player and each other, influencing their behavior. To prevent prompt inflation, where raw historical data overwhelms small models, the system never puts full history into the prompt. Instead, the model sees a one-line, bucketed summary of sentiment (e.g., “you feel warmly toward Oona, wary of the Patron”), capped to the few strongest feelings derived from integer sentiment values. This “bounded summary” approach is highly relevant for Indian developers building AI assistants or customer service bots that require memory and context without performance degradation. It demonstrates that making agents feel “alive” through persistent memory can be achieved efficiently without complex architectural changes, as long as the prompt is managed effectively.
Implications for Indian AI Teams
This Hugging Face hackathon project offers several practical takeaways for Indian AI teams. First, it reinforces the idea that small models, when structured and prompted effectively, can achieve significant complexity and functionality. This makes AI development more accessible and potentially less resource-intensive, aligning with the “frugal innovation” mindset often seen in the Indian tech ecosystem. Second, the emphasis on robust serving layers and security protocols for information handling is paramount for any production-grade AI system, especially given India’s evolving data privacy regulations. Finally, the intelligent management of agent memory through bounded summaries provides a blueprint for scalable and efficient multi-agent systems, applicable across various sectors from customer service to financial modeling.
Source: Hugging Face Blog, https://huggingface.co/blog/build-small-hackathon/thousand-token-wood-sim-v2