Terminal-Bench
Visit Toolterminal-bench is an open-source benchmark for testing AI agents in real terminal environments. It evaluates LLMs on complicated tasks, providing a reproducible task suite and execution harness for real-world evaluation.
At a glance
Trending
Also listed in