
USABench
The Definitive Government Data Analysis Benchmark for LLMs
Comprehensive Framework for Public Data
Open-source benchmark specifically designed for AI evaluations on government data
Extensible, Open Standard
Designed to be used across any public dataset
Transparent Methodology
Reproducible model comparison with function-calling evaluation
Community Driven
Supporting ongoing evaluation contributions from researchers and developers
Performance Leaderboard
Model rankings with tier-specific breakdowns and comprehensive performance metrics
Methodology Transparency
Comprehensive evaluation framework with reproducible protocols and validation measures
Evaluation Framework Overview
- Ragas integration with function calling evaluation methodology
- 4-component binary scoring: Function Selection, Parameter Accuracy, Execution Success, Result Accuracy
- Direct LiteLLM integration without framework dependencies
- Real-time API execution with BLS and BEA endpoints
- Comprehensive error analysis and debugging protocols
Government Data Source Integration
- Federal agency API integration spanning OMB, BLS, and BEA datasets
- Cross-source analytical capability requirements
- Data quality assurance and validation frameworks
- Standardized dataset repository with 459 unified records
- Multi-temporal coverage (2014-2024) ensuring data relevance
Complexity Tier Definitions
- Easy (30%): Basic data retrieval and simple aggregations
- Medium (50%): Multi-table joins and statistical analysis
- Hard (20%): Complex temporal analysis and cross-source synthesis
- Geographic, demographic, and sectoral analysis requirements
- Real-world analytical scenario representation
Performance Measurement
- Binary accuracy scoring with execution validation
- Statistical significance verification protocols
- Comparative analysis across model architectures
- Historical performance tracking and trend analysis
- Comprehensive benchmark integrity assurance
Data Source Foundation
Real government economic data from authoritative federal agencies
OMB
Office of Management and Budget
Federal budget data, economic forecasts
2014-2024 coverage
BLS
Bureau of Labor Statistics
Employment Cost Index, CPI, Productivity
2014-2024 coverage
BEA
Bureau of Economic Analysis
GDP by industry, regional personal income
2023-2024 coverage
Community Contribution Process
Join the evaluation ecosystem and contribute to AI progress in government data analysis
Model Evaluation Execution
Use the provided SDK and standardized protocols to execute evaluations on your model using our comprehensive benchmark suite
Result Validation
Performance results undergo validation and statistical significance verification to ensure benchmark integrity
Community Review
Submitted results are reviewed by the community with model documentation and technical specifications
Leaderboard Integration
Approved submissions are integrated into the leaderboard following community review and approval processes
Getting Started
Repository Access & SDK
git clone https://github.com/usabench/usabenchpip install litellm sqlparse pydantic numpy pandaspython3 -m USABench --model your-model --evaluation-type mixedTechnical support and community engagement available through GitHub Issues and community forums
Strategic Positioning & Impact
Establishing the industry standard for government data analysis AI evaluation
Industry Standard Establishment
USABench establishes an authoritative evaluation framework for systems using LLMs to access public datasets, providing systematic capability assessment tools for the AI research community. The benchmark addresses critical gaps in specialized domain evaluation while maintaining accessibility for independent research execution.
Community Ecosystem Development
This project provides a transparent methodology, reproducible evaluation protocols, and community support in an effort to futher the conversation around AI and government data analysis. Regular model submissions and performance updates maintain benchmark relevance and establish ongoing capability measurement standards.
Government Data Analysis Advancement
USABench accelerates AI development in critical government data domains, supporting enhanced analytical capabilities across federal datasets. The benchmark enables systematic progress measurement and competitive development across model architectures and approaches.
USAFacts Sponsorship
USAFacts, a nonpartisan organization dedicated to government transparency through data, led the development of USABench and continues to review submissions. Note: USAFacts does not endorse any specific model, organization, or political party, and provides no warranty or guarantee regarding the accuracy or reliability of the benchmark or underlying data and code. See disclaimers in GitHub.
Join the USABench Community
Be part of establishing the definitive standard for AI evaluation on government data analysis