OSWorld
Visit ToolOSWorld is an open-source benchmark for evaluating multimodal AI agents in open-ended tasks within real computer environments. It provides a standardized framework for testing and comparing agent performance on various tasks.
At a glance
Trending
Also listed in