Home / Hacker News / Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/KURkDOL

Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/KURkDOL

November 07, 2023 - Hacker News

Show HN: Open-source model and scorecard for measuring hallucinations in LLMs Hi all! This morning, we released a new Apache 2.0 licensed model on HuggingFace for detecting hallucinations in retrieval augmented generation (RAG) systems. What we've found is that even when given a "simple" instruction like "summarize the following news article," every LLM that's available hallucinates to some extent, making up details that never existed in the source article -- and some of them quite a bit. As a RAG provider and proponents of ethical AI, we want to see LLMs get better at this. We've published an open source model, a blog more thoroughly describing our methodology (and some specific examples of these summarization hallucinations), and a GitHub repository containing our evaluation from the most popular generative LLMs available today. Links to all of them are referenced in the blog here, but for the technical audience here, the most interesting additional links might be: - https://ift.tt/TwtcWFM... - https://ift.tt/s1xEI9k We hope that releasing these under a truly open source license and detailing the methodology, we hope to increase the viability of anyone really quantitatively measuring and improving the generative LLMs they're publishing. https://ift.tt/HyUh0BA November 7, 2023 at 12:41AM

Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/KURkDOL

Reviewed by Manish Pethev on November 07, 2023 Rating: 5

No comments:

If you have any suggestions please send me a comment.

Subscribe to: Post Comments ( Atom )

Facebook SDK

Recent Posts

Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/KURkDOL

No comments:

Flickr

Follow Us

Recent Posts

Facebook

Popular Posts

Ads

Categories

Show Menu

Social Icons

Main Menu

Menu

Social Media Icons

Link List

Report Abuse

Author

Show HN: InBulk – tools to help you perform different tasks in bulk https://ift.tt/iL6BwVg

Contact Form

Search This Blog

Blog Archive

Tags

Random Posts

Tags

Recent

Recent Posts