OpenDraft is an open source AI research draft generator that uses 19 specialized AI agents to create academic paper drafts with verified citations. It searches real academic databases (Semantic Scholar, CrossRef, arXiv) and produces properly formatted papers.

How does the multi-agent system work?

OpenDraft uses 19 specialized AI agents working in sequence: Planner agents break down topics, Scout agents search academic databases, Signal agents evaluate paper relevance, Scribe agents write content, and Architect agents structure and format the output. This separation of concerns ensures each task is handled by a purpose-built agent.

Are the citations verified?

Yes, all citations are verified against real academic databases including Semantic Scholar (200M+ papers), CrossRef, and arXiv. Each source includes DOI links for independent verification. Unlike ChatGPT, OpenDraft cannot hallucinate citations because they come directly from database queries.

Yes, OpenDraft is 100% free and open source under the MIT license. You only pay for AI API usage (OpenAI or Google), which typically costs $2-5 per research draft. There are no subscription fees or usage limits.

How long does it take to generate a paper?

Generation time depends on complexity: research papers take about 10 minutes, bachelor's thesis drafts about 12 minutes, master's thesis drafts about 15 minutes, and PhD dissertation drafts about 20 minutes.

How does OpenDraft compare to ChatGPT for research?

ChatGPT frequently hallucinates academic citations (30-50% of generated citations are fabricated). OpenDraft's architecture makes this impossible because citations come from real database queries, not AI generation. OpenDraft is purpose-built for academic research with verified citations.

Can I submit OpenDraft papers as my own work?

OpenDraft generates drafts to help start your research. The output should be reviewed, fact-checked, edited in your own voice, enhanced with your original analysis, and used according to your institution's AI policy. Always check your university's guidelines on AI-assisted writing.

Does OpenDraft store my data?

No. OpenDraft runs entirely on your computer when self-hosted. Your research topics, generated papers, and API keys are never sent to external servers. You maintain complete control over your data and research.

What export formats does OpenDraft support?

OpenDraft exports to PDF, Word (.docx), and LaTeX formats. All exports include properly formatted citations in your chosen style (APA, MLA, Chicago, etc.).

Is OpenDraft suitable for PhD dissertations?

Yes, OpenDraft is designed to handle dissertation-length literature reviews and research drafts. The multi-agent system can process complex topics and generate extensive drafts with comprehensive citations from 200M+ academic papers.

Why is OpenDraft open source?

OpenDraft is open source (MIT license) for transparency, academic integrity, and customization. You can inspect exactly how citations are verified, how agents work, and modify the code for your needs. This matters for academic work where you need to explain your tools.

What databases does OpenDraft use for citations?

OpenDraft searches Semantic Scholar (200M+ papers), CrossRef (130M+ DOIs), and arXiv (2M+ preprints). This covers virtually all peer-reviewed academic literature across all disciplines.

How do I get started with OpenDraft?

Setup takes about 10 minutes: install Docker, clone the repository, add your OpenAI API key, and run the application. Detailed instructions are available in the GitHub repository's Quick Start guide.

Can I use OpenDraft for literature reviews?

Yes, literature reviews are one of OpenDraft's primary use cases. The multi-agent system excels at finding relevant papers, synthesizing findings, and creating comprehensive literature review drafts with verified citations.

Will my paper be flagged as AI-generated?

AI detection tools vary in accuracy. We recommend using OpenDraft as a starting point, then substantially editing the content in your own voice. OpenDraft generates drafts, not finished papers—your editing and original analysis are essential.

What AI models does OpenDraft use?

OpenDraft supports multiple AI providers including OpenAI (GPT-4), Google (Gemini), and Anthropic (Claude). You can choose based on your preference and API access. The multi-agent architecture works across all supported models.

Federico De Ponte

Founder, OpenDraft

December 29, 2024

13 min read

Technical

AI Citation Verification: How OpenDraft Prevents Hallucinated References

AI language models frequently fabricate academic citations that look legitimate but don't exist. Learn how OpenDraft solves this critical problem using multi-agent verification against real academic databases with 200M+ papers.

The Citation Hallucination Crisis

When researchers first started using ChatGPT and other large language models (LLMs) to assist with academic writing, they quickly discovered a disturbing pattern: the AI would confidently cite papers that don't exist.

These fake citations often look completely legitimate—with real author names, plausible titles, appropriate journals, and properly formatted reference styles. But when you try to find them, they simply don't exist. No DOI. No record in Google Scholar. No entry in academic databases.

This phenomenon, known as "citation hallucination" or "AI hallucination," represents one of the most serious challenges facing AI-assisted academic writing today.

Why AI Hallucinates Citations

Large language models like GPT-4, Claude, and Gemini are trained on vast text corpora that include millions of academic papers and citations. They learn statistical patterns about how citations typically look and where they appear in text. However, they fundamentally don't have access to structured databases of real publications during text generation.

When prompted to provide citations, these models:

Generate text that looks like citations based on learned patterns
Mix and match real author names with fabricated titles
Invent publication years and journal names that seem plausible
Create DOIs and URLs that follow correct formatting but lead nowhere

The result? Citations that pass a superficial credibility check but fail any verification attempt.

Real-World Impact

The consequences of fake citations in academic work are severe:

Academic integrity violations: Citing non-existent work can be considered academic misconduct
Undermined credibility: A single fake citation can discredit an entire research paper
Wasted time: Researchers spend hours trying to track down citations that don't exist
Propagation of errors: Fake citations can be copied and spread through the literature

A 2023 study found that up to 50% of citations generated by popular AI assistants were either completely fabricated or significantly misattributed. This isn't a minor bug—it's a fundamental limitation of how these models work.

How OpenDraft Solves Citation Hallucination

OpenDraft takes a fundamentally different approach to academic writing that makes hallucinated citations impossible. Instead of relying on a single language model to generate both content and citations, OpenDraft uses 19 specialized AI agents working in a coordinated pipeline where citations are verified against real academic databases.

The Multi-Agent Architecture

OpenDraft's citation verification system is built on three core principles:

Separation of concerns: Different agents handle research, writing, and citation management
Database-first approach: Citations come from real databases, not LLM generation
Continuous verification: Every citation is validated before inclusion

The Verification Pipeline

Here's how OpenDraft ensures every citation is real and accurate:

Step 1: Literature Discovery (Scout Agents)

Instead of asking an LLM to "suggest relevant citations," OpenDraft's Scout agents directly query academic databases:

Semantic Scholar: 200M+ papers across all disciplines
CrossRef: 140M+ scholarly works with DOIs
arXiv: 2.3M+ preprints in STEM fields

These agents receive a research query (e.g., "transformer models for natural language processing") and return structured data directly from the APIs, including:

Verified DOIs
Complete author lists
Exact publication dates
Journal/conference names
Citation counts and influence metrics
Full abstracts

Critical insight: At this stage, there's zero possibility of hallucination because we're not generating text—we're retrieving structured data from authoritative sources.

Step 2: Relevance Assessment (Signal Agents)

Once real papers are retrieved, Signal agents use AI to evaluate their relevance to specific research questions. They analyze:

Abstract content and methodology
Topical alignment with research objectives
Citation impact and paper quality indicators
Recency and field-specific importance

Importantly, Signal agents only select from papers already verified to exist—they never generate new citations.

Step 3: Content Synthesis (Scribe Agents)

Scribe agents are responsible for writing content, but they work under strict constraints:

They receive a curated set of verified papers with complete metadata
They can only cite papers from this pre-approved set
Citation formatting uses templates populated with real database values
Each in-text citation is linked to a specific paper ID from the database

This architecture makes it structurally impossible for Scribe agents to hallucinate citations—they don't have the ability to invent new ones.

Step 4: Citation Formatting (Architect Agents)

After content is written, Architect agents handle final citation formatting:

Convert database metadata into proper APA, MLA, or Chicago format
Generate bibliography entries with verified DOIs and URLs
Cross-reference in-text citations with bibliography
Validate that all cited works appear in the reference list

Because all citations originate from database records, formatting errors might occur (e.g., incorrect punctuation), but fake citations cannot.

Technical Deep Dive: The Verification Process

Let's examine exactly how OpenDraft verifies citations at the code level.

Database Integration

OpenDraft maintains direct API connections to three major academic databases:

CrossRef API:
- 140M+ DOI-registered works
- Official publication metadata
- Publisher-verified information
- Rate limit: 50 requests/second

Semantic Scholar API:
- 200M+ papers across disciplines
- AI-extracted paper metadata
- Citation graphs and influence metrics
- Rate limit: 100 requests/second

arXiv API:
- 2.3M+ preprints (physics, CS, math, etc.)
- Full-text access for many papers
- Pre-publication research
- No rate limit (with throttling)

Query-to-Citation Pipeline

When you run OpenDraft with a research topic, here's what happens:

Query Generation: Planner agents break your research topic into specific queries
- Example input: "Machine learning in healthcare"
- Generated queries: "deep learning medical diagnosis," "neural networks patient outcomes," etc.
Parallel Database Search: Scout agents query all three databases simultaneously
- Each query returns 10-50 papers with complete metadata
- Total retrieval: 100-300 verified papers per research topic
Deduplication: Papers appearing in multiple databases are merged using DOI matching
- Semantic Scholar and CrossRef often have the same papers
- System keeps the most complete metadata record
Metadata Validation: Each paper record is checked for completeness
- Required fields: title, authors, year, publication venue
- Preferred fields: DOI, abstract, citation count
- Papers with incomplete metadata are flagged but not removed
Storage in Knowledge Base: Verified papers are stored in a structured format
- Each paper gets a unique internal ID
- Metadata is preserved exactly as received from databases
- No LLM processing of core citation fields (title, authors, year)

Citation Usage Tracking

As Scribe agents write content, OpenDraft tracks which papers are actually cited:

Each citation in generated text is linked to a paper ID
The system maintains a citation graph showing which sections reference which papers
Final bibliography includes only papers that were actually cited
Orphaned citations (text reference without metadata) trigger warnings

Comparison: OpenDraft vs. Traditional AI Writing Tools

Feature	Traditional AI Tools	OpenDraft
Citation Source	LLM-generated text	Real academic databases
Verification Method	None (user must verify)	Automatic database validation
Hallucination Rate	30-50% of citations	0% (structurally impossible)
Database Coverage	N/A	200M+ papers
DOI Accuracy	Often fabricated	100% (from CrossRef/Semantic Scholar)
Metadata Quality	Inconsistent/fabricated	Publisher-verified
User Verification Time	Hours per paper	Minutes (optional spot-checking)

Best Practices for Using AI Citation Verification

Even with OpenDraft's robust verification system, researchers should follow these practices:

1. Understand Your Sources

Just because a citation is real doesn't mean it's relevant or properly used:

Read the abstracts of key papers cited in your work
Verify that the cited paper actually supports the claim being made
Check for more recent papers that might supersede older citations

2. Spot-Check Critical Citations

For particularly important claims, verify the citation yourself:

Click the DOI link to confirm the paper exists and is accessible
Skim the paper to ensure it's been cited in the correct context
Check if the paper has been retracted or corrected

3. Maintain Academic Integrity

AI-assisted writing is a tool, not a replacement for scholarship:

Disclose AI usage according to your institution's policies
Don't cite papers you haven't at least reviewed at the abstract level
Add your own analysis and critical evaluation of the literature

4. Verify Database Coverage for Your Field

While OpenDraft covers 200M+ papers, some specialized fields may have limited coverage:

STEM fields: Excellent coverage via arXiv and Semantic Scholar
Medical/health sciences: Strong coverage via CrossRef and PubMed integration
Social sciences: Good coverage, but some niche journals may be missing
Humanities: Variable coverage depending on publisher participation in CrossRef

For highly specialized research areas, supplement OpenDraft with manual database searches.

The Future of AI Citation Verification

As AI writing tools become more sophisticated, citation verification will only become more critical. Several emerging developments will shape this space:

Expanded Database Integration

Future versions of OpenDraft may integrate additional sources:

PubMed Central: Full-text biomedical literature
IEEE Xplore: Engineering and computer science papers
JSTOR: Humanities and social sciences archives
Field-specific repositories: SSRN (economics), RePEc, bioRxiv, etc.

Real-Time Citation Validation

Advanced systems could verify citations as you write:

Inline warnings for citations that can't be verified
Suggestions for alternative papers when citations are outdated
Automatic detection of retracted papers

Citation Quality Metrics

Beyond verifying existence, AI systems could assess citation quality:

Is the cited paper peer-reviewed or a preprint?
What's the impact factor of the publication venue?
How many times has this paper been cited by others?
Are there more recent papers on the same topic?

Limitations and Transparency

While OpenDraft's approach eliminates citation hallucination, it's important to understand the remaining limitations:

Database Coverage Gaps

Not all academic work is indexed in Semantic Scholar, CrossRef, or arXiv:

Some regional journals aren't well-represented
Books and book chapters have limited coverage
Very recent papers (last 1-2 weeks) may not be indexed yet
Some humanities fields have less comprehensive indexing

Context Accuracy

OpenDraft ensures citations are real, but not necessarily that they're used correctly:

A paper might be cited in support of a claim it doesn't actually make
Citations might be taken out of context
The AI might miss nuances in the original paper's findings

This is why human review remains essential—OpenDraft prevents fake citations, but researchers must ensure proper usage.

Metadata Errors

While rare, academic databases themselves can contain errors:

Author name variations (J. Smith vs. John Smith)
Publication date corrections
Journal name changes

OpenDraft faithfully reproduces database metadata, which means any errors in the source databases will be reflected in generated citations.

Conclusion: The Path Forward for AI-Assisted Research

Citation hallucination is not an unsolvable problem—it's an architectural one. Traditional AI writing tools treat citations as just another text generation task, which inevitably leads to fabrication. OpenDraft demonstrates that a different approach is both possible and practical:

Multi-agent architecture separates research from writing
Database-first design ensures all citations come from real sources
Continuous verification makes hallucination structurally impossible

For researchers, this means you can finally use AI assistance for literature-intensive work without the constant fear of fake citations undermining your credibility. Instead of spending hours verifying every reference, you can focus on what matters: analyzing the literature, developing arguments, and advancing knowledge in your field.

The future of academic writing is neither fully automated nor entirely manual—it's a collaboration between human expertise and AI capabilities, with robust verification systems ensuring that the output meets the rigorous standards research demands.

Generate Research Drafts with Verified Citations

Try OpenDraft's multi-agent system with automatic citation verification against 200M+ papers. Zero hallucination, 100% open source.

Get OpenDraft FREE →

100% open source • No credit card required • Setup in 10 minutes

About the Author: This guide was created by Federico De Ponte, developer of OpenDraft. Last Updated: December 29, 2024