Benchmarking Natural Language to SQL Conversion Models and Agents
Leaderboard
Submit Results
About
Filter Controls
Select Dataset
Select Metrics to Display
Tip: Click on metrics to show/hide them in the table. The best value for each metric is highlighted with a 🏆. Metrics Precision/Recall/F1 are calculated only among the sqls that are executable.
No Data Available
This dataset currently has no results. Be the first to submit!
Submit Your Results
Submit your NL2SQL model results to be included in the leaderboard.
Please ensure your results follow the required format.
Submission Portal Under Development
About NL2SQL Leaderboard
This leaderboard tracks the performance of various Natural Language to SQL (NL2SQL) models
across multiple benchmarks and metrics.
Metrics Explained:
Execution Ability: Ability to generate executable SQL queries
Correct Rate: Percentage of correctly generated SQL queries
Execution Efficiency: Performance on executable SQL queries
Table/Column Precision/Recall/F1: Accuracy in selecting correct tables and columns when the sql is executable
Supported Datasets:
BIRD Dev: Benchmark for large-scale database tasks in the real world
BIRD Train: Training set for BIRD benchmark
Spider: Large-scale complex and cross-domain semantic parsing and text-to-SQL dataset