Source-led article

Google’s Gemini-SQL2 Achieves 80.04% Accuracy on BIRD Text-to-SQL Benchmark

AI News India/Jun 13, 2026/3 min read

View of London, with the Improvements of its Port. (14072916311).jpg | by SMU Central University Libraries | wikimedia_commons | No restrictions

TITLE: Google’s Gemini-SQL2 Achieves 80.04% Accuracy on BIRD Text-to-SQL Benchmark
SLUG: google-gemini-sql2-bird-benchmark-accuracy
EXCERPT: Google Research has announced Gemini-SQL2, a new text-to-SQL capability powered by Gemini 3.1 Pro, achieving 80.04% execution accuracy on the BIRD Single Model Leaderboard. This development aims to improve natural language interaction with databases.
CATEGORY: AI News India
TAGS: Google AI, Gemini, Text-to-SQL, BIRD benchmark, database AI, Google Research
SEO_TITLE: Google Gemini-SQL2 Scores 80.04% on BIRD Text-to-SQL Benchmark
SEO_DESCRIPTION: Google’s new Gemini-SQL2, powered by Gemini 3.1 Pro, reaches 80.04% execution accuracy on the BIRD Text-to-SQL benchmark, enhancing natural language to SQL query generation.
MEDIA_QUERY: Google Gemini-SQL2 Text-to-SQL benchmark chart
IMAGE_ALT: Chart showing Google Gemini-SQL2 performance on the BIRD Text-to-SQL benchmark.

Google Research has unveiled Gemini-SQL2, a new text-to-SQL capability that leverages Gemini 3.1 Pro to translate natural language questions into executable SQL queries. The system achieved an 80.04% execution accuracy score on the BIRD Text-to-SQL Leaderboard in the Single Model track, indicating its ability to generate SQL that not only runs but also returns correct results. This marks a significant step in making database interactions more accessible through natural language.

Performance on the BIRD Benchmark

The BIRD benchmark, or BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation, is an industry standard for assessing text-to-SQL performance. Unlike older benchmarks, BIRD includes 12,751 question-SQL pairs across 95 databases from 37 professional domains, incorporating “dirty values” and requiring external knowledge grounding. The benchmark’s execution accuracy (EX) metric ensures that generated SQL not only appears valid but also successfully executes and produces results matching the gold standard query. Gemini-SQL2’s score of 80.04% places it above Google’s previous top entry, Gemini-SQL, on the Single Trained Model Track, which restricts the use of preprocessing or agentic frameworks to measure the core model’s capability.

Key facts

Feature	Detail
Product	Gemini-SQL2
Powering Model	Gemini 3.1 Pro
Benchmark	BIRD Text-to-SQL Leaderboard (Single Model)
Execution Accuracy	04%

Implications for Data Services

Google’s announcement on X highlighted the difficulty of generating accurate SQL from natural language due to “data subtlety & complex business contexts.” The improved SQL understanding offered by Gemini-SQL2 is expected to “elevate natural language skills across Google’s data services.” While Google has not yet confirmed specific product integrations, potential targets include existing Gemini-based SQL generation features in BigQuery Studio, AlloyDB AI, and Cloud SQL Studio. This development could streamline data analysis for users in India, particularly for businesses and developers working with large datasets who need to extract insights without deep SQL expertise.

Comparison with Human Performance and Competitors

Google’s internal benchmarks indicate human performance on the BIRD task at 92.96% accuracy, leaving a 12.92-point gap from Gemini-SQL2’s 80.04%. A chart shared by Google shows Gemini-SQL2 leading eight named competitors, with Google now holding the top two named positions on the leaderboard with Gemini-SQL2 and Gemini-SQL. The chart also reveals that several specialized 32B SQL models outperform some general frontier models, emphasizing the importance of specialized training for this task.

Availability and Implementation Pattern

As of the announcement, Gemini-SQL2 is a capability and not yet available as a standalone model string or API. However, developers can implement a schema-grounded pattern using current Gemini models via the google-genai SDK, with the possibility to swap in a Gemini-SQL2 ID once it becomes available. This implementation involves providing the database schema and a natural language question to the model, which then returns an executable SQLite query. For production systems, Google recommends adding execution verification to run the returned SQL, catch errors, and retry with error messages, mirroring the BIRD benchmark’s approach to execution accuracy.

Source: MarkTechPost (https://www.marktechpost.com/2026/06/12/google-releases-gemini-sql2-gemini-3-1-pro-text-to-sql-scores-80-04-on-bird-single-model-leaderboard/)

BIRD benchmark database AI Gemini Google AI Google Research Text-to-SQL