Source-led article

Count Anything AI Model Aims to Revolutionize Object Counting Across Diverse Images

AI News India/Jun 13, 2026/3 min read

Featured image from the source article

Researchers from Tsinghua University and other institutions have introduced “Count Anything,” a novel artificial intelligence model designed to accurately count objects across a wide array of image types using only a text prompt. This development addresses a long-standing challenge in AI: reliably counting diverse objects, from microscopic cell samples to large crowds, within a single unified system. The model reports a significant reduction in error rates compared to previous specialized counting systems.

Key facts

Feature	Description
Model Name	Count Anything
Developers	Tsinghua University and partner institutions
Core Capability	Counts objects across any image type using text prompts
Performance	Cuts error rate by half compared to previous systems
Dataset	CLOC (220,000 images, 619 categories, 15M objects)

Overcoming Specialization Barriers

Historically, counting objects in images has required highly specialized AI systems tailored for specific tasks, such as counting cells in medical scans or vehicles in satellite imagery. These systems often struggle when presented with image types outside their training domain. “Count Anything” aims to provide a universal solution, capable of processing images as varied as medical tissue samples, agricultural scenes, everyday photographs, and drone imagery. This generalized approach could streamline applications across various sectors, from healthcare to urban planning.

How Count Anything Works

The model integrates two distinct counting methods: one for larger, clearly visible objects that uses bounding boxes, and another for small, densely packed objects that marks each target with a dot. These predictions are then merged, with a simple rule to prevent double-counting by prioritizing predictions with higher confidence when both methods identify the same object. The system builds upon Meta’s pretrained SAM3 model, adding small adapter components for the counting task rather than retraining the entire model.

The CLOC Dataset

A crucial component of “Count Anything” is the creation of the CLOC dataset. Recognizing the lack of a comprehensive public dataset for text-guided counting, the researchers compiled and cleaned existing datasets. The resulting CLOC dataset is described as the largest of its kind, containing approximately 220,000 images, 619 categories, and 15 million labeled objects across six diverse domains. This extensive dataset was vital for training the model to achieve its broad applicability.

Performance and Limitations

In comparative tests conducted by the development team, “Count Anything” significantly outperformed competing systems like CountGD, CLIP-Count, and Grounding DINO. On average, the model miscounts by about nine objects per queried category, which is less than half the error rate of the best alternative models. While it remains competitive in pure crowd counting, it does not fully match the performance of highly specialized systems in that specific domain. The researchers acknowledge limitations, particularly with ambiguous or highly specialized terms and in extremely dense scenes where occlusion makes distinguishing objects difficult. The code for “Count Anything” is publicly available on GitHub.

Implications for Indian Tech and Startups

For the Indian AI and tech ecosystem, “Count Anything” presents opportunities for innovation in areas requiring precise object enumeration. Startups in sectors like agriculture (crop yield estimation), healthcare (cell counting in diagnostics), and smart cities (traffic analysis, crowd management) could potentially integrate this general-purpose counting AI to build more versatile and efficient solutions. Its ability to handle diverse image types with a single model could reduce development costs and accelerate deployment of AI-powered analytics tools.

Source: The Decoder, https://the-decoder.com/new-ai-model-called-count-anything-does-exactly-what-it-says-and-thats-harder-than-it-sounds/

AI model computer vision image recognition object counting Tsinghua University