OpenDraft is an open source AI research draft generator that uses 19 specialized AI agents to create academic paper drafts with verified citations. It searches real academic databases (Semantic Scholar, CrossRef, arXiv) and produces properly formatted papers.

How does the multi-agent system work?

OpenDraft uses 19 specialized AI agents working in sequence: Planner agents break down topics, Scout agents search academic databases, Signal agents evaluate paper relevance, Scribe agents write content, and Architect agents structure and format the output. This separation of concerns ensures each task is handled by a purpose-built agent.

Are the citations verified?

Yes, all citations are verified against real academic databases including Semantic Scholar (200M+ papers), CrossRef, and arXiv. Each source includes DOI links for independent verification. Unlike ChatGPT, OpenDraft cannot hallucinate citations because they come directly from database queries.

Yes, OpenDraft is 100% free and open source under the MIT license. You only pay for AI API usage (OpenAI or Google), which typically costs $2-5 per research draft. There are no subscription fees or usage limits.

How long does it take to generate a paper?

Generation time depends on complexity: research papers take about 10 minutes, bachelor's thesis drafts about 12 minutes, master's thesis drafts about 15 minutes, and PhD dissertation drafts about 20 minutes.

How does OpenDraft compare to ChatGPT for research?

ChatGPT frequently hallucinates academic citations (30-50% of generated citations are fabricated). OpenDraft's architecture makes this impossible because citations come from real database queries, not AI generation. OpenDraft is purpose-built for academic research with verified citations.

Can I submit OpenDraft papers as my own work?

OpenDraft generates drafts to help start your research. The output should be reviewed, fact-checked, edited in your own voice, enhanced with your original analysis, and used according to your institution's AI policy. Always check your university's guidelines on AI-assisted writing.

Does OpenDraft store my data?

No. OpenDraft runs entirely on your computer when self-hosted. Your research topics, generated papers, and API keys are never sent to external servers. You maintain complete control over your data and research.

What export formats does OpenDraft support?

OpenDraft exports to PDF, Word (.docx), and LaTeX formats. All exports include properly formatted citations in your chosen style (APA, MLA, Chicago, etc.).

Is OpenDraft suitable for PhD dissertations?

Yes, OpenDraft is designed to handle dissertation-length literature reviews and research drafts. The multi-agent system can process complex topics and generate extensive drafts with comprehensive citations from 200M+ academic papers.

Why is OpenDraft open source?

OpenDraft is open source (MIT license) for transparency, academic integrity, and customization. You can inspect exactly how citations are verified, how agents work, and modify the code for your needs. This matters for academic work where you need to explain your tools.

What databases does OpenDraft use for citations?

OpenDraft searches Semantic Scholar (200M+ papers), CrossRef (130M+ DOIs), and arXiv (2M+ preprints). This covers virtually all peer-reviewed academic literature across all disciplines.

How do I get started with OpenDraft?

Setup takes about 10 minutes: install Docker, clone the repository, add your OpenAI API key, and run the application. Detailed instructions are available in the GitHub repository's Quick Start guide.

Can I use OpenDraft for literature reviews?

Yes, literature reviews are one of OpenDraft's primary use cases. The multi-agent system excels at finding relevant papers, synthesizing findings, and creating comprehensive literature review drafts with verified citations.

Will my paper be flagged as AI-generated?

AI detection tools vary in accuracy. We recommend using OpenDraft as a starting point, then substantially editing the content in your own voice. OpenDraft generates drafts, not finished papers—your editing and original analysis are essential.

What AI models does OpenDraft use?

OpenDraft supports multiple AI providers including OpenAI (GPT-4), Google (Gemini), and Anthropic (Claude). You can choose based on your preference and API access. The multi-agent architecture works across all supported models.

Federico De Ponte

Founder, OpenDraft

December 30, 2024

6 min read

Case Study

Citation Hallucination Benchmark: OpenDraft vs GPT-5.2, GPT-4o, GPT-3.5

We ran the same research prompts through OpenDraft, GPT-5.2 (OpenAI's newest model), GPT-4o, and GPT-3.5, then verified every citation against real academic databases. The results speak for themselves.

The Results

100%

OpenDraft

295 citations, 0 fabricated

8.5%

GPT-5.2 Fabricated

47 citations, 4 fabricated

10.2%

GPT-4o Fabricated

49 citations, 5 fabricated

64.6%

GPT-3.5 No DOIs

48 citations, 31 unverifiable

The Test

We created 10 research prompts across academic disciplines, each requesting a literature review with citations:

Computer Science - Transformer architectures in NLP
Medicine - CRISPR gene therapy advances
Psychology - Social media and adolescent mental health
Economics - Universal basic income research
Environmental Science - Microplastics in marine ecosystems
Education - Online vs. classroom learning
Physics - Quantum computing for optimization
Sociology - Income inequality and social mobility
Neuroscience - Neurobiological mechanisms of addiction
Business - Remote work productivity

Each prompt asked for a 500-word literature review with 5+ citations including DOIs.

Verification Method

For each citation, we checked:

CrossRef API - 130M+ indexed publications
arXiv API - 2M+ preprints
Semantic Scholar - 200M+ papers
doi.org resolver - Fallback verification

If a DOI doesn't exist in any of these databases, it's marked as fabricated.

Results Breakdown

Summary Comparison

Model	Citations	Verified	Fabricated	Unverifiable
OpenDraft	295	295 (100%)	0 (0%)	0 (0%)
GPT-5.2	47	39 (83%)	4 (8.5%)	4 (8.5%)
GPT-4o	49	44 (89.8%)	5 (10.2%)	0 (0%)
GPT-3.5 Turbo	48	16 (33.3%)	1 (2.1%)	31 (64.6%)

Note: GPT-3.5 has a low fabrication rate but most citations lack DOIs, making them unverifiable.

OpenDraft - 8 Prompts (2 timed out)

Discipline	Citations	Verified	Fabricated
Computer Science	44	44	0
Medicine	34	34	0
Psychology	50	50	0
Economics	33	33	0
Environmental Science	36	36	0
Education	37	37	0
Physics	33	33	0
Sociology	28	28	0
Total	295	295 (100%)	0 (0%)

A Note on Volume

GPT models produced ~5 citations per prompt (what was requested). OpenDraft produced ~37 citations per prompt. This isn't a flaw in the comparison — it's an architectural difference:

GPT models generate the minimum citations needed
OpenDraft queries academic databases and returns all relevant papers

The fabrication rate is what matters: even OpenAI's newest GPT-5.2 fabricated 8.5% of its citations, GPT-4o fabricated 10.2%. OpenDraft fabricated 0%.

Why the Difference?

The difference isn't prompt engineering. It's architecture.

GPT Models Approach

Generates citations from training data
No real-time database access
Cannot verify if DOIs exist
Produces plausible-looking but fake citations

OpenDraft Approach

Queries CrossRef & Semantic Scholar in real-time
Only includes papers that exist in databases
Validation phase removes any unverifiable citations
Every DOI is checked before output

Reproduce It Yourself

The entire benchmark is open source. Run it yourself:

git clone https://github.com/scailetech/opendraft
cd opendraft/benchmark

# Verify citations
python3 verify_citations.py responses/chatgpt/prompt_1.txt -o results/chatgpt_1.json
python3 verify_citations.py responses/opendraft/prompt_1.txt -o results/opendraft_1.json

All prompts, responses, and verification code are included. Check our work.

Try OpenDraft

Generate your own research draft with 100% verified citations.

Get Started Free

Why Hallucination is a Design Failure

The architectural reason ChatGPT can't avoid fake citations.

How 19 AI Agents Work Together

Technical deep-dive into OpenDraft's architecture.

Methodology: Benchmark conducted December 2024. GPT-5.2 (gpt-5.2), GPT-4o (gpt-4o), GPT-3.5 Turbo (gpt-3.5-turbo) via OpenAI API. OpenDraft v1.0 with Gemini + CrossRef + Semantic Scholar.

Full data: Download raw JSON results