Moda - Monitoring for AI Agents

GatewayBench v1

Moda Labs · 2026 · MIT

A synthetic benchmark dataset for evaluating LLM gateway systems and routing decisions. Provides 2,000 test cases with ground truth labels across four distinct task types: tool selection, retrieval, chat, and stress testing.

2,000

Examples

Task Types

10+

Domains

Avg Tools

LLMBenchmarkRoutingTool-CallingSyntheticEvaluation

Datasets & Publications

GatewayBench v1

Collaborate with us