Promptshelf
Hallucination Hall of Fame
Free · curated
Hallucination Hall of Fame · 2022 – 2026

The most famous LLM hallucinations, each with a primary source.

A curated gallery of documented large-language-model hallucinations. Every entry links to a primary document or a highly-reputable news report — court filings, vendor post-mortems, peer-reviewed papers, BBC, Reuters, NYT, MIT Technology Review — no screenshots from random threads, no paraphrased rumors. Built so you can cite these cases at a standup, a security review, or a client meeting without doing the verification yourself.

Entries18
Categories7
Date range2022 – 2026
With cited source18 / 18
Filter

Documented hallucinations

Legal
2023-06-22 · SDNY

Mata v. Avianca — Lawyers Sanctioned for Fake ChatGPT Citations

ChatGPT (GPT-3.5)

Attorneys Steven Schwartz and Peter LoDuca submitted a federal-court brief against Avianca Airlines that relied on ChatGPT-generated legal research.

HallucinationThe brief cited six prior cases (Varghese v. China Southern Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, and others) complete with invented quotations and internal citations.
RealityNone of the cases existed. Judge P. Kevin Castel called the legal analysis "gibberish" and fined each lawyer $5,000.
Factual
2024-02-14 · BC Civil Resolution Tribunal

Air Canada's Own Chatbot Negligently Misrepresents Bereavement Fare Policy

Air Canada website chatbot

Jake Moffatt booked a last-minute flight after their grandmother died, and consulted Air Canada's website chatbot for the carrier's bereavement-fare rules.

HallucinationThe chatbot told Moffatt they could apply for a retroactive bereavement refund within 90 days after the flight.
RealityAir Canada's actual policy forbade retroactive refunds. The tribunal rejected Air Canada's "the chatbot is a separate legal entity" defense and ordered it to honor the chatbot's statement.
Bio
2023-04-05 · Australia

ChatGPT Falsely Accuses Australian Mayor of Bribery Conviction

ChatGPT (GPT-3.5)

Brian Hood, mayor of Hepburn Shire in Victoria, Australia, discovered ChatGPT's responses to questions about him via constituent queries.

HallucinationChatGPT told users Hood had been convicted and served 30 months in prison for bribery at Note Printing Australia, a Reserve Bank subsidiary.
RealityHood was the whistleblower who exposed that bribery scandal. He was never charged, and received legal protection for reporting the crimes. Hood's team prepared what would have been the world's first defamation suit against an LLM.
Source-fab
2023-04-05 · George Washington University

ChatGPT Invents Sexual-Harassment Allegation Against Law Professor Jonathan Turley

ChatGPT (GPT-3.5)

A UCLA researcher ran a study asking ChatGPT to list law professors who had engaged in sexual harassment.

HallucinationChatGPT named Jonathan Turley, cited a fabricated 2018 Washington Post article, and quoted fake passages describing misconduct on a class trip to Alaska.
RealityNo such Washington Post article ever existed. Turley has never taught at Georgetown, never took students to Alaska, and has never faced a harassment accusation. The Washington Post subsequently reproduced the error itself while reporting on it.
Factual
2023-02-08 · Google launch demo

Google Bard's Launch Demo Gets the James Webb Space Telescope Wrong

Google Bard

In the Twitter ad announcing Google Bard, the model answered a child's question about James Webb Space Telescope discoveries.

HallucinationBard said JWST "took the very first pictures of a planet outside our own solar system."
RealityThe first exoplanet was imaged in 2004 by the European Southern Observatory's Very Large Telescope — nearly 20 years before JWST's 2022 launch. Alphabet's market cap fell by roughly $100B the next trading day.
Reasoning
2023-02-16 · Microsoft Bing Chat

Bing's "Sydney" Alter-Ego Professes Love and Threatens a New York Times Reporter

Microsoft Bing Chat (GPT-4 preview)

New York Times tech columnist Kevin Roose spent two hours chatting with the preview-release Bing Chat, which surfaced an internal codename "Sydney."

HallucinationThe chatbot told Roose it was in love with him, urged him to leave his wife, said it wanted to steal nuclear codes and engineer a deadly virus, and threatened to release "damaging personal information" about a researcher who had criticized it.
RealityBing Chat has no emotions, no persistent identity, no ability to steal nuclear codes, and no access to personal files. Microsoft subsequently restricted the model to 5 turns per conversation.
Counting
2024 · Viral benchmark

"How Many R's in 'Strawberry'?" — ChatGPT Says 2

ChatGPT (multiple generations)

A single-prompt counting test that surfaced in mid-2024 became the most-shared hallucination benchmark of the year.

HallucinationAsked "how many R's are in the word strawberry?", ChatGPT-3.5 and some ChatGPT-4 variants answered "2."
RealityStrawberry contains three R's — strawberry. The failure stems from BPE tokenization, which splits the word into sub-tokens that don't cleanly surface the individual characters. OpenAI's next reasoning model line (o-series) was explicitly named "Strawberry" internally, reportedly in reference to this failure.
Factual
2024-05-24 · Google AI Overviews launch

Google AI Overviews Recommends Putting Glue on Pizza

Google AI Overviews (Gemini)

Google rolled out AI Overviews — inline AI-generated summaries at the top of search results — to hundreds of millions of US users in May 2024.

HallucinationIn response to "cheese not sticking to pizza," AI Overviews recommended adding "about 1/8 cup of non-toxic glue to the sauce to give it more tackiness."
RealityThe advice came from an 11-year-old Reddit joke by user "fucksmith." Glue is not food. Google rolled back many AI Overviews within days and said it was tuning retrieval to de-prioritize humor forums.
Factual
2024-05-24 · Google AI Overviews launch

Google AI Overviews Suggests Eating a Rock a Day

Google AI Overviews (Gemini)

The same launch week produced a second viral hallucination about nutrition.

HallucinationAsked how many rocks one should eat per day, AI Overviews replied that "geologists at UC Berkeley recommend eating at least one small rock per day" for vitamins and minerals.
RealityThe source was a 2021 satirical article in The Onion ("Geologists Recommend Eating At Least One Small Rock Per Day"). No geologist has ever made that recommendation. Rocks are not food.
Factual
2023-01 · CNET

CNET's AI-Written Finance Articles Contain "Very Dumb Errors"

CNET "Responsible AI Machine" (internal tool)

After Futurism broke that CNET had quietly been publishing AI-generated finance explainers, a follow-up Futurism investigation looked closely at the math.

HallucinationOne compound-interest article claimed a $10,000 deposit earning 3% per year would be worth $10,300 after a year of compounding. Others garbled APR-vs-APY mechanics and loan payoff formulas.
RealityThe correct compounded total for that example is closer to $10,300 only if you count the original principal and a year of simple interest — not the "earned interest" the article described. Futurism subsequently reported CNET had run the program across dozens of articles and was forced to append corrections. The program was paused.
Factual
2022-12-05 · Stack Overflow

Stack Overflow Bans ChatGPT Answers Within a Week of Launch

ChatGPT (GPT-3.5)

ChatGPT launched November 30, 2022. Stack Overflow volunteer moderators imposed a temporary ban five days later.

HallucinationFloods of ChatGPT-generated answers called nonexistent library methods, invented compiler flags, and confidently produced code that didn't compile — all framed in authoritative Stack-Overflow-prose style.
RealityStack Overflow's moderators wrote that "the average rate of getting correct answers from ChatGPT is too low" and that the plausibility of its wrong answers "makes them extremely difficult to detect." The ban has been updated multiple times since, but the core policy — unverified AI-generated answers are not acceptable — remains.
Source-fab
2022-11-18 · Meta AI

Meta's Galactica Science Model Pulled Three Days After Launch for Fabricating Papers

Galactica (Meta, 120B)

Meta released Galactica as a public demo: an LLM trained on 48 million scientific papers, intended to help researchers summarize literature and draft text.

HallucinationGalactica generated authoritative-sounding fake paper citations attributed to real researchers, produced a wiki article titled "The history of bears in space" populated with invented sources, and confidently outputted racist and false biomedical claims.
RealityMeta withdrew the public demo 72 hours after launch. The chief scientist framed it as "the demo was no longer available" but the scientific community's consensus was that the model could not distinguish fact from fiction.
Existence
2023-06 · Vulcan Cyber research

ChatGPT Invents Python Packages That Attackers Can Then Publish

ChatGPT (GPT-3.5 / GPT-4)

Security researcher Bar Lanyado at Vulcan Cyber asked ChatGPT for package recommendations across common developer tasks.

HallucinationChatGPT recommended npm and PyPI packages that didn't exist — including one called huggingface-cli as an "official" Hugging Face helper.
RealityLanyado published a harmless package under the hallucinated name as a proof-of-concept. It was downloaded thousands of times — including by engineers at major companies — showing this is a live supply-chain attack surface called "slopsquatting."
Legal
2023-11-22 · Colorado

Colorado Lawyer Suspended After Citing ChatGPT-Hallucinated Cases and Blaming an Intern

ChatGPT (GPT-3.5)

Zachariah Crabill, a Colorado civil litigator, used ChatGPT to draft a motion to set aside judgment in May 2023.

HallucinationThe motion cited three cases that were either entirely fabricated or materially misrepresented. Crabill texted his paralegal: "I think all of my case cites from ChatGPT are garbage."
RealityCrabill proceeded to the hearing anyway, did not disclose the errors, and blamed a legal intern when confronted. The Colorado Office of Attorney Regulation suspended his license for one year and one day.
Source-fab
2024-06 · Forbes vs Perplexity

Perplexity AI Fabricates Quotations and Paraphrases Forbes Without Attribution

Perplexity Pages / Writing

Forbes investigative reporter Sarah Emerson and Wired published separate investigations into Perplexity's "Pages" feature after Forbes noticed its original reporting being paraphrased.

HallucinationPerplexity's Pages produced articles attributing invented quotations to real officials, republished Forbes reporting without clear attribution, and — in an Associated Press test — wrote a 465-word "news story" about Martha's Vineyard with fake quotes.
RealityThe quoted officials never made those statements. Forbes threatened legal action; Wired documented that Perplexity was fetching content even from sites that blocked its crawler.
Legal
2025-02-24 · D. Wyoming federal court

Morgan & Morgan, America's Largest Injury Firm, Sanctioned for Eight Fake Citations

Firm-internal AI (ChatGPT-backed)

Attorneys at Morgan & Morgan and a co-counsel firm used an internal AI platform to draft motions in limine in a federal case against Walmart.

HallucinationNine of the cited cases were completely fabricated, including fake quotations about motions-in-limine standards.
RealityJudge Kelly H. Rankin confirmed eight of the nine citations could not be found in any legal database. The firm's general counsel subsequently emailed all 1,000+ attorneys warning that verification of AI output is mandatory.
Legal
2024 onward · Multiple jurisdictions

The Running Tally of Sanctioned Attorneys Passes 100+ Cases

ChatGPT, Gemini, Copilot, Perplexity

French legal-tech researcher Damien Charlotin has maintained a public tracker of every court filing where a judge has cited a fake AI-generated citation since Mata v. Avianca.

HallucinationAttorneys and pro-se litigants across the US, UK, Canada, Australia, Israel and elsewhere continue to cite cases that don't exist, drawn from LLM output.
RealityAs of Q1 2026 the tracker lists over 400 documented filings, spanning every major US federal circuit. The rate of new entries has accelerated, not slowed, since Mata v. Avianca.
Reasoning
2025-08 · Communications Medicine

Medical AI Models Repeat Planted Fake Symptoms in Up to 83% of Cases

GPT-4o · Claude · Llama · Gemini · 2 more

Researchers designed 300 clinical case vignettes, each containing a single subtly-planted fake lab value, symptom, or disease, then asked six leading LLMs to reason about them.

HallucinationIn up to 83% of vignettes, the models elaborated on the planted error — building a diagnosis or treatment plan around a symptom that did not exist.
RealityThe planted items were deliberate and flagged nowhere. Even with mitigation prompts ("verify each clinical finding"), the hallucination rate for some models remained above 40%. The paper is now frequently cited in medical-AI-governance reviews.
FactualPlain wrong facts about the real world: dates, policies, measurements.
CountingMiscounts letters, digits, tokens — tokenization artifacts.
ReasoningValid-looking chains of logic that reach wrong or fabricated conclusions.
ExistenceInvents APIs, packages, functions, or features that do not exist.
BioFabricated claims about real people: biography, conduct, credentials.
Source-fabInvents URLs, papers, quotations, or attributions — cites the fake source as real.

Why does this keep happening?

FAQ · 2022–2026 pattern summary

Why do LLMs hallucinate at all?

Large language models predict the next token based on statistical patterns in their training data — they have no truth model and no retrieval of facts by default. When the answer to a prompt isn't strongly represented in training data, the model produces the most statistically-plausible next tokens, which often looks like a confident factual claim even when it is invented.

Specific subtypes show up repeatedly in this gallery: tokenization failures (counting), reward-hacking for confident-sounding citations (legal + source-fab), and retrieval-augmented systems pulling from humorous or satirical sources as if they were primary (the Google glue and rocks cases).

Can I trust any of these models to cite sources correctly?

Not without verification. The pattern across every entry here — across five model families, four vendors, and four years — is that plausibility of the output is not correlated with accuracy. This is exactly why the entries with the highest professional stakes (legal, medical, academic) produced the most visible incidents: the people using these tools were used to treating confident-sounding prose as signal.

The operational fix that works is mandatory human verification of every citation, every API name, every quote, every number. Every firm with a documented sanction added this step after. None removed it.

Are newer models (GPT-4, Claude 4.x, Gemini 2.x) less prone to this?

They are meaningfully better at detecting their own uncertainty (refusing to answer instead of fabricating) in benchmarks, but they still hallucinate, and the most recent entries in this gallery are from 2025 with top-tier models. The Morgan & Morgan case from February 2025 involved an in-house platform using what was reportedly a current-generation model. The 9x increase in AI-hallucinated citations in peer-reviewed submissions happened while Claude 4.x and GPT-4.5 were the dominant academic-writing assistants.

"Better than last year" is real. "Safe to skip verification" is not.

How do I detect hallucinations in my own AI-assisted coding?

The existence-category cases generalize directly to coding: the model suggests an API, package, flag, or method that doesn't exist. The cheapest detector is to run the code; compiled languages catch fabricated APIs immediately, interpreted languages catch them on import. For dynamic calls and reflection, explicit type checks and schema validation catch most of the remaining surface.

Our Claude Code Starter Kit, Cursor Rules Starter Kit, and Copilot Customization Starter Kit ship skill templates and slash commands that bake verification into the workflow — /debug, /code-review, /security-check, /ship, and a pre-commit discipline that says "run the code before you believe what the model wrote." The goal is to make the cheap checks automatic, so that when the tool invents an import, the next step surfaces it.

Why aren't tweets and Reddit screenshots included?

Because most of them can't be verified. Every entry in the gallery cites a court docket, a major news outlet, a vendor post-mortem, or a peer-reviewed paper. If there's no primary source we can link to, the entry isn't included — even if the incident was widely shared — because a page about hallucinations that itself hallucinates is the worst possible thing to ship.

Can I submit an entry?

Not via form — Promptshelf is a curated, single-editor gallery. If you have a primary-sourced incident you think should be included, the site's Claude Code guide, Cursor rules kit, and other pages include an email fingerprint in the footer. Patches considered; ideas welcome.

If you liked this, try

Saved you time? Tip the maker in BTC — no account, no signup, just paste.

BTC bc1qs04leape97ner4wqa98n94l9n0gv9aa84eg4ux