Varies
Visit ToolSWE-bench is an evaluation benchmark for AI agents on real-world software engineering tasks. It provides leaderboards and datasets to compare various AI models and agents.
At a glance
Trending
SWE-bench is an evaluation benchmark for AI agents on real-world software engineering tasks. It provides leaderboards and datasets to compare various AI models and agents.
Trending
About
SWE-bench is a comprehensive evaluation benchmark designed to assess the performance of AI agents in solving real-world software engineering tasks. It features official leaderboards for various AI models and agents, including mini-SWE-agent, and offers different subsets like SWE-bench Verified, Multilingual, Lite, and Multimodal to cater to diverse evaluation needs. Researchers and developers can use SWE-bench to compare AI capabilities in code generation, problem-solving, and task resolution across different programming languages and visual contexts. The platform also provides tools like SWE-smith for training custom models and a CLI for easier evaluation, helping to advance the field of AI in software development.
Capabilities
Pricing & Plans
Likely Free
Free
FAQs
Trending