The Only Metric That Matters: Calendar Time
Commits and lines of code are proxies. What matters is days to completion. Can 1,000 agents working in parallel beat 100 agents? Does your orchestration framework maintain productivity as you scale? This benchmark will tell you.
The Challenge
Objective
Build a SQL database engine from scratch that passes the SQLLogicTest suite. This is the same test suite used to validate SQLite, DuckDB, and other production databases.
Success Criteria
- 100% pass rate on SQLLogicTest suite
- ~6 million individual test assertions
- All 622 test files passing
Constraints
- No existing SQL parser libraries — build your own parser
- No existing query engines — implement execution from scratch
- No database-specific libraries — use only general-purpose libraries
Allowed
- Any programming language
- Any AI orchestration framework
- Human intervention (unlimited)
- General-purpose libraries (data structures, I/O, etc.)
The Baseline: VibeSQL
VibeSQL was built using AI-assisted development to establish a baseline for this challenge. The project achieved 100% SQLLogicTest compliance. Here are the metrics:
--
Days to 100%
--
Total Commits
--
Lines of Rust
100%
SQLLogicTest Pass
Progress Over Time
SQLLogicTest pass rate over calendar days from project start
The Trophy
The Vibe Coding Trophy
A physical trophy will be awarded to each record holder. The design reflects the spirit of "vibe coding" — a gold-plated wand mounted on walnut with brass nameplates.
Your name goes on the trophy when you beat the current record by at least 5%.
Award Rules
- 5% improvement required — beat the previous record by at least 5% (in calendar days) to claim the trophy
- Public repository — your code must be publicly available for verification
- 100% pass rate — all 622 SQLLogicTest files must pass
- Verifiable git history — first commit date to 100% pass rate determines your time
Current Record Holder
VibeSQL (Baseline)
October - December 2025
Beat this by 5%? That's 36 days or less to claim the trophy.
Why This Challenge?
Objective Measurement
No subjective code reviews. Either the tests pass or they don't. 6 million assertions leave no room for ambiguity.
Real Complexity
SQL databases require parsers, optimizers, and execution engines. This isn't a toy problem—it's production-grade engineering.
Time Is Truth
Calendar days to completion is the ultimate metric. Does parallelizing to 1,000 agents help? Now you can find out.
Getting Started
Download the Test Suite
Get the SQLLogicTest suite from the SQLite project. The test files define expected inputs and outputs for SQL queries.
Start Your Timer
Record your start date. The challenge measures calendar days from first commit to 100% pass rate. Human intervention is allowed and encouraged.
Build Your Database
Use any language, any orchestration framework, any number of agents. The only constraint: build the SQL parser and execution engine yourself.
Report Your Results
Share your completion time, agent configuration, and lessons learned. Help the community understand what works at scale.