Harbor
Visit ToolHarbor is an open-source framework for evaluating and optimizing AI agents and language models. It enables users to build and share benchmarks and environments for conducting experiments in parallel.
At a glance
Trending
Harbor is an open-source framework for evaluating and optimizing AI agents and language models. It enables users to build and share benchmarks and environments for conducting experiments in parallel.
Trending
About
Harbor is a robust, open-source framework designed for the evaluation and optimization of AI agents and language models. Developed by the creators of Terminal-Bench, it provides a comprehensive toolkit for assessing agent performance, including those like Claude Code and OpenHands. Users can leverage Harbor to create and share custom benchmarks and environments, facilitating diverse experimental setups. The framework supports parallel execution of experiments across thousands of environments, utilizing providers such as Daytona and Modal, and can generate rollouts for reinforcement learning optimization. Its flexibility makes it suitable for a wide range of AI development and research tasks.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending