--- title: Why AI Makes Things Up (And How to Catch It) description: AI hallucinations explained simply: what they are, why models confidently invent facts, and practical ways to spot false information before it causes problems. date: January 20, 2026 author: Robert Soares category: ai-fundamentals --- You ask ChatGPT for a citation. It gives you one. Complete with author name, journal title, publication year, even a page number. Looks perfect. One problem: that paper doesn't exist. Never did. This is an AI hallucination. The model didn't find a wrong answer. It invented one. And it did so with total confidence, no hesitation, no disclaimer. If you use AI tools for work, this matters. Because hallucinations aren't rare edge cases. They're a fundamental quirk of how these systems work. Understanding why they happen is the first step toward not getting burned by them. ## What Is an AI Hallucination? An AI hallucination is when a language model generates information that sounds plausible but is factually wrong or completely fabricated. The term comes from the idea that the model is "seeing" something that isn't there. This includes: - Fake citations to papers that don't exist - Made-up statistics with no source - Invented quotes attributed to real people - Confident answers about topics the model knows nothing about - Details that contradict the source material you provided The tricky part isn't that AI makes mistakes. All tools make mistakes. The tricky part is that hallucinations look exactly like accurate responses. There's no red flag, no uncertainty marker, no "I'm not sure about this" disclaimer. Just confident-sounding text that happens to be wrong. ## Why Does This Happen? Here's the thing most people don't realize: language models aren't looking things up. They're predicting what words should come next based on patterns they learned during training. Think of it like autocomplete on your phone. When you type "Nice to," your phone might suggest "meet you." It's not consulting a dictionary or checking grammar rules. It's just predicting what word typically follows "Nice to" based on millions of text messages. LLMs do the same thing, just at a much larger scale. They've been trained on billions of documents, and they're constantly predicting: given everything before this point, what word is most likely to come next? This works surprisingly well most of the time. But it creates a fundamental problem: the model has no concept of "true" or "false." It only knows "probable" or "improbable." When you ask for a citation, the model predicts what a citation should look like. Author name format, journal name format, year in parentheses. It generates something that matches the pattern of a citation. Whether that citation actually exists is a completely separate question the model can't answer. ### The Training Incentive Problem Recent research has identified something interesting: [hallucinations may be partly caused by how models are trained](https://www.lakera.ai/blog/guide-to-hallucinations-in-large-language-models). Training benchmarks tend to reward confident answers and penalize uncertainty. If a model says "I don't know," it often gets marked wrong. So models learn to guess. Like a student who knows leaving a test question blank guarantees zero points, LLMs learn that a confident guess is better than admitting uncertainty. Even when that guess is pure invention. This isn't a flaw in a specific model. It's baked into how these systems are evaluated and trained. ## How Often Does This Happen? Hallucination rates vary wildly depending on the model and the type of question. According to [Vectara's April 2025 Hallucination Leaderboard](https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/), the best models now hallucinate less than 1% of the time on factual questions. Google's Gemini-2.0-Flash-001 leads at 0.7%. Several other models, including OpenAI's o3-mini-high, hover around 0.8-0.9%. That sounds pretty good. But context matters. When models are asked about specialized topics, accuracy drops. The same research shows legal information has a 6.4% hallucination rate even among top-tier models. Medical and healthcare questions average 4.3%. Financial data sits around 2.1%. And those are the best models. Older or smaller models hallucinate far more often. Some still hit rates above 25%. Here's another wrinkle: [OpenAI's newest reasoning models actually hallucinate more frequently on certain question types](https://research.aimultiple.com/ai-hallucination/). The o3 model hallucinates 33% of the time on person-specific questions, double the rate of its predecessor. More sophisticated reasoning doesn't automatically mean better accuracy. ## The Business Reality If you're using AI for anything beyond casual experimentation, hallucinations have real consequences. [According to Deloitte research](https://research.aimultiple.com/ai-hallucination/), 77% of businesses report concerns about AI hallucinations. That concern appears justified. In 2024, 47% of enterprise AI users admitted to making at least one major business decision based on hallucinated content. The legal field offers a stark example. [According to Harvard Law School's Digital Law Review](https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/), 83% of legal professionals have encountered fabricated case law when using AI for legal research. Fake case citations that look completely legitimate. You might remember the lawyer who submitted a brief citing cases that didn't exist. ChatGPT had invented them. The lawyer didn't check. The judge was not amused. That's an extreme example. But smaller hallucinations happen constantly. A slightly wrong statistic in a report. A misattributed quote in a presentation. An invented feature in a competitor analysis. These might not make the news, but they erode trust and credibility over time. ## How to Spot Hallucinations There's no foolproof method. But several techniques improve your odds. ### Ask the Same Question Multiple Ways If you get inconsistent answers when you rephrase a question, something's wrong. A fact is a fact regardless of how you ask about it. But a hallucination is a probability calculation that might come out differently each time. Try asking your question, then asking it again with different wording. If the model contradicts itself, treat both answers as suspect. This technique has been formalized in research as "semantic entropy." [According to studies on hallucination detection](https://www.lakera.ai/blog/guide-to-hallucinations-in-large-language-models), asking a model to generate multiple answers to the same question and comparing them can identify confabulation about 79% of the time. High variation means low confidence in the answer. ### Verify Specific Claims Names, dates, statistics, and citations are the most common hallucination targets. They're also the easiest to verify. If the model gives you a citation, search for it. If it mentions a statistic, find the original source. If it quotes someone, confirm the quote exists. This sounds tedious. It is. But it's also the difference between sharing accurate information and passing along fiction. ### Watch for Suspicious Confidence Real experts hedge. They say "approximately" and "according to" and "in most cases." They acknowledge exceptions and limitations. Models trained to be confident don't always do this. If an AI gives you a detailed, specific answer to a question that should be uncertain or contested, be suspicious. That certainty might be the model predicting what a confident answer looks like rather than actually knowing the answer. ### Be Extra Careful in Specialized Domains General knowledge questions tend to be safer. The training data contains many discussions of common topics, so patterns are well-established. Specialized domains are riskier. Legal citations, medical information, recent events, niche technical details. The training data is thinner, so the model has less to work with. More prediction, less pattern matching. More hallucination. ### Check Your Own Context Hallucinations increase when you give the model confusing or contradictory input. If you provide a document and ask questions about it, make sure the document is clear and complete. Garbage in, garbage out. This also applies to how you phrase prompts. Vague questions get vague answers. Specific questions are easier to verify. ## How to Reduce Hallucinations Beyond detection, there are ways to make hallucinations less likely in the first place. ### Use Retrieval-Augmented Generation (RAG) RAG systems give the model access to a knowledge base during response generation. Instead of relying entirely on training data, the model can look up current information. This helps significantly. [Research shows RAG techniques can reduce hallucinations by 71% when implemented properly](https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/). The key phrase is "implemented properly." A poorly configured RAG system can actually make things worse by retrieving irrelevant or outdated information. ### Choose the Right Model for the Task Not all models hallucinate equally. For tasks where accuracy matters, pick a model with lower hallucination rates on the specific type of content you need. Some models are also better at declining to answer rather than guessing. That's a feature, not a bug. "I don't know" is infinitely more useful than a confident wrong answer. ### Give Better Context The more relevant context you provide, the less the model needs to predict from general patterns. If you're asking about a specific document, include the document. If you're asking about recent events, provide sources. This doesn't eliminate hallucinations. But it shifts the balance from "guess based on training data" toward "synthesize from provided information." ### Ask for Sources Simply asking "What's your source for that?" can help. It won't stop hallucinations. But it sometimes prompts the model to acknowledge uncertainty or reveal when it's extrapolating beyond what it actually knows. Don't trust the sources it gives you. Verify them. But the request itself can change the model's behavior. ### Keep Humans in the Loop The most reliable hallucination detector remains human verification. [According to IBM](https://www.ibm.com/think/topics/ai-hallucinations), human oversight serves as "the final backstop measure" to catch errors before deployment. This doesn't mean reviewing every single output. It means having verification steps for high-stakes content. Citations get checked. Statistics get traced to sources. Claims about people or companies get confirmed. 76% of enterprises now include human-in-the-loop processes specifically to catch hallucinations before information reaches customers or influences decisions. ## The Progress Being Made Hallucination rates have dropped dramatically. [Data from Vectara shows the best models improved from around 21.8% hallucination rates in 2021 to 0.7% in 2025](https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/). That's a 96% reduction in four years. Techniques are advancing too. Models trained with calibration-aware rewards, which penalize overconfidence rather than just wrong answers, show promise. Targeted finetuning on difficult scenarios helps. Span-level verification in RAG systems catches more unsupported claims. But fundamental limitations remain. [Mathematical research suggests hallucinations are inevitable under current architectures](https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/). Language models simply can't learn all possible computable functions. They'll always be predicting patterns, not accessing truth. That's not a reason to avoid AI tools. It's a reason to use them wisely. ## The Practical Takeaway AI hallucinations aren't bugs that will eventually be fixed. They're features of how prediction-based systems work. The good models hallucinate less. But they still hallucinate. For marketing, sales, and business use: - **Treat AI output as a first draft, not final copy.** Assume it needs verification. - **Verify specific claims before sharing them.** Citations, statistics, quotes, names. - **Use AI for what it's good at.** Brainstorming, drafting, summarizing, variations. Not authoritative facts. - **Keep humans in the loop for anything public-facing.** A quick check catches most problems. - **Match the tool to the stakes.** Low-stakes tasks can tolerate some risk. High-stakes tasks need verification. The models will keep improving. Hallucination rates will keep dropping. But the underlying mechanism won't change. These systems predict probable text, not true text. Understanding that distinction is what separates productive AI use from getting caught citing papers that don't exist.