Human-Eval
Visit ToolHuman-eval is an evaluation harness for large language models trained on code. It provides a framework for running and testing untrusted model-generated code, assessing their code generation capabilities.
At a glance
Trending
Human-eval is an evaluation harness for large language models trained on code. It provides a framework for running and testing untrusted model-generated code, assessing their code generation capabilities.
Trending
About
Human-eval is an evaluation harness specifically designed for assessing the performance of large language models (LLMs) that have been trained on code. This tool provides a robust framework for running and testing untrusted model-generated code, allowing researchers and developers to evaluate the code generation capabilities of AI models. It includes functionalities for generating samples, evaluating functional correctness, and providing detailed results such as pass@k metrics. The tool emphasizes security, requiring users to enable execution of untrusted code within a robust security sandbox. It is an essential resource for anyone involved in the development and benchmarking of code-generating AI.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending