ShypdShypd.ai

Terminal-Bench

Visit Tool

terminal-bench is an open-source benchmark for testing AI agents in real terminal environments. It evaluates LLMs on complicated tasks, providing a reproducible task suite and execution harness for real-world evaluation.

At a glance

Pricing
Open Source
Free tier
Yes
API
No
Skill level
Technical

Trending

      

Also listed in

This tool also appears in

Explore

Browse AI tools by category