HaluEval
Visit ToolHaluEval is an open-source benchmark for evaluating hallucination in Large Language Models. It provides a large-scale dataset and code for generating, evaluating, and analyzing LLM responses.
At a glance
Trending
HaluEval is an open-source benchmark for evaluating hallucination in Large Language Models. It provides a large-scale dataset and code for generating, evaluating, and analyzing LLM responses.
Trending
About
HaluEval is a comprehensive, open-source benchmark designed to evaluate hallucination in Large Language Models (LLMs). It features a substantial dataset of 35,000 samples, including 5,000 human-annotated general user queries and 30,000 task-specific examples across question answering, knowledge-grounded dialogue, and text summarization. The repository provides code for generating hallucinated samples, evaluating LLM performance in recognizing hallucinations, and analyzing the types of content LLMs tend to hallucinate. This tool is invaluable for researchers and developers aiming to improve the reliability and factual consistency of LLMs by offering a standardized method for identifying and understanding hallucination tendencies.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending