Skip to main content
OpenDraft
Back to Home
Federico De Ponte

Federico De Ponte

Founder, OpenDraft

8 min read
Opinion

Citation Hallucination Isn't a Bug — It's a Design Failure

Most AI writing tools fail at research for the same reason: they treat research drafting as a text generation problem. It isn't.


The Structural Problem

Research drafting is a pipeline problem — literature discovery, filtering, synthesis, structuring, and citation verification. Prompting a single large language model to do all of this end-to-end guarantees hallucinations, especially at the citation layer.

Here's why:

When you ask an LLM to "cite sources," it has two options:

  1. Admit it doesn't have access to a research database (unhelpful)
  2. Generate plausible-looking citations from its training data (hallucination)

Most choose option 2. Not because they're poorly prompted — because they have no other choice.

The uncomfortable truth

Citation hallucination is not a prompt engineering issue. It's a systems design issue.

Why "Better Prompts" Won't Fix This

The industry response to hallucination has been prompt engineering: "Be more specific." "Add constraints." "Use system prompts."

This misses the point.

You cannot prompt your way out of a capability gap. An LLM without database access cannot verify citations. An LLM generating text cannot simultaneously search external sources. The architecture prevents it.

Asking for "better prompts" is asking for better guardrails on a system that lacks the fundamental capability to do the job.

One Possible Alternative: Separation of Concerns

OpenDraft explores a different approach: separating the research workflow into specialized agents, each constrained by source availability and verification rules.

  • Agents that search cannot write — Scout agents query Semantic Scholar, CrossRef, and arXiv. They return metadata, not prose.
  • Agents that write cannot invent sources — Scribe agents draft text, but can only reference papers already retrieved and verified.
  • Citations must resolve before appearing — Every reference is checked against real databases. If it doesn't exist, it doesn't appear.

This architectural constraint makes hallucination structurally difficult, not just discouraged.

What This Doesn't Solve

Let me be clear about the limitations:

  • This does not solve research. Understanding, interpretation, and original contribution still require human researchers.
  • This does not replace reading papers. AI synthesis is lossy. Critical reading is irreplaceable.
  • This introduces new failure modes. Missed papers (database coverage), weak synthesis (AI summarization limits), over-reliance on citation count as quality signal.

Multi-agent separation trades one set of problems for another.

But pretending that better prompts will fix structural flaws is not a serious position.

We Want to Be Proven Wrong

This framing might be incorrect. Multi-agent separation might be the wrong abstraction. There might be better architectural patterns we haven't considered.

If you think this approach is flawed, we want to know why — publicly.

OpenDraft is fully open source. Criticism, forks, and counter-examples are welcome.

Join the debate

If you disagree with this framing, open an issue explaining why. The best contributions often start as criticism.

Open an Issue →

Related Articles


About the Author: Federico De Ponte is the developer of OpenDraft. This essay reflects a design philosophy, not a claim of correctness. Disagreement is welcome.