Mata v. Avianca — Lawyers Sanctioned for Fake ChatGPT Citations
ChatGPT (GPT-3.5)Attorneys Steven Schwartz and Peter LoDuca submitted a federal-court brief against Avianca Airlines that relied on ChatGPT-generated legal research.
A curated gallery of documented large-language-model hallucinations. Every entry links to a primary document or a highly-reputable news report — court filings, vendor post-mortems, peer-reviewed papers, BBC, Reuters, NYT, MIT Technology Review — no screenshots from random threads, no paraphrased rumors. Built so you can cite these cases at a standup, a security review, or a client meeting without doing the verification yourself.
Attorneys Steven Schwartz and Peter LoDuca submitted a federal-court brief against Avianca Airlines that relied on ChatGPT-generated legal research.
Jake Moffatt booked a last-minute flight after their grandmother died, and consulted Air Canada's website chatbot for the carrier's bereavement-fare rules.
Brian Hood, mayor of Hepburn Shire in Victoria, Australia, discovered ChatGPT's responses to questions about him via constituent queries.
A UCLA researcher ran a study asking ChatGPT to list law professors who had engaged in sexual harassment.
In the Twitter ad announcing Google Bard, the model answered a child's question about James Webb Space Telescope discoveries.
New York Times tech columnist Kevin Roose spent two hours chatting with the preview-release Bing Chat, which surfaced an internal codename "Sydney."
A single-prompt counting test that surfaced in mid-2024 became the most-shared hallucination benchmark of the year.
Google rolled out AI Overviews — inline AI-generated summaries at the top of search results — to hundreds of millions of US users in May 2024.
The same launch week produced a second viral hallucination about nutrition.
After Futurism broke that CNET had quietly been publishing AI-generated finance explainers, a follow-up Futurism investigation looked closely at the math.
ChatGPT launched November 30, 2022. Stack Overflow volunteer moderators imposed a temporary ban five days later.
Meta released Galactica as a public demo: an LLM trained on 48 million scientific papers, intended to help researchers summarize literature and draft text.
Security researcher Bar Lanyado at Vulcan Cyber asked ChatGPT for package recommendations across common developer tasks.
Zachariah Crabill, a Colorado civil litigator, used ChatGPT to draft a motion to set aside judgment in May 2023.
Forbes investigative reporter Sarah Emerson and Wired published separate investigations into Perplexity's "Pages" feature after Forbes noticed its original reporting being paraphrased.
Attorneys at Morgan & Morgan and a co-counsel firm used an internal AI platform to draft motions in limine in a federal case against Walmart.
French legal-tech researcher Damien Charlotin has maintained a public tracker of every court filing where a judge has cited a fake AI-generated citation since Mata v. Avianca.
Researchers designed 300 clinical case vignettes, each containing a single subtly-planted fake lab value, symptom, or disease, then asked six leading LLMs to reason about them.
FAQ · 2022–2026 pattern summary
Large language models predict the next token based on statistical patterns in their training data — they have no truth model and no retrieval of facts by default. When the answer to a prompt isn't strongly represented in training data, the model produces the most statistically-plausible next tokens, which often looks like a confident factual claim even when it is invented.
Specific subtypes show up repeatedly in this gallery: tokenization failures (counting), reward-hacking for confident-sounding citations (legal + source-fab), and retrieval-augmented systems pulling from humorous or satirical sources as if they were primary (the Google glue and rocks cases).
Not without verification. The pattern across every entry here — across five model families, four vendors, and four years — is that plausibility of the output is not correlated with accuracy. This is exactly why the entries with the highest professional stakes (legal, medical, academic) produced the most visible incidents: the people using these tools were used to treating confident-sounding prose as signal.
The operational fix that works is mandatory human verification of every citation, every API name, every quote, every number. Every firm with a documented sanction added this step after. None removed it.
They are meaningfully better at detecting their own uncertainty (refusing to answer instead of fabricating) in benchmarks, but they still hallucinate, and the most recent entries in this gallery are from 2025 with top-tier models. The Morgan & Morgan case from February 2025 involved an in-house platform using what was reportedly a current-generation model. The 9x increase in AI-hallucinated citations in peer-reviewed submissions happened while Claude 4.x and GPT-4.5 were the dominant academic-writing assistants.
"Better than last year" is real. "Safe to skip verification" is not.
The existence-category cases generalize directly to coding: the model suggests an API, package, flag, or method that doesn't exist. The cheapest detector is to run the code; compiled languages catch fabricated APIs immediately, interpreted languages catch them on import. For dynamic calls and reflection, explicit type checks and schema validation catch most of the remaining surface.
Our Claude Code Starter Kit, Cursor Rules Starter Kit, and Copilot Customization Starter Kit ship skill templates and slash commands that bake verification into the workflow — /debug, /code-review, /security-check, /ship, and a pre-commit discipline that says "run the code before you believe what the model wrote." The goal is to make the cheap checks automatic, so that when the tool invents an import, the next step surfaces it.
Because most of them can't be verified. Every entry in the gallery cites a court docket, a major news outlet, a vendor post-mortem, or a peer-reviewed paper. If there's no primary source we can link to, the entry isn't included — even if the incident was widely shared — because a page about hallucinations that itself hallucinates is the worst possible thing to ship.
Not via form — Promptshelf is a curated, single-editor gallery. If you have a primary-sourced incident you think should be included, the site's Claude Code guide, Cursor rules kit, and other pages include an email fingerprint in the footer. Patches considered; ideas welcome.
Saved you time? Tip the maker in BTC — no account, no signup, just paste.
bc1qs04leape97ner4wqa98n94l9n0gv9aa84eg4ux