## Paper 6: Trustworthy Online Controlled Experiments
**Authors:** Kohavi, Tang
**Year:** 2020
**Venue:** Cambridge University Press (Book/Publication)
**DOI:** [10.1017/9781108653985](https://doi.org/10.1017/9781108653985)

### Research Question
How can organizations utilize the scientific method (A/B testing) to make data-driven decisions that are statistically trustworthy, scalable, and innovative?

### Methodology
- **Design:** Empirical / Best Practices Guide
- **Approach:** Analysis of >20,000 controlled experiments per year across Google, LinkedIn, and Microsoft.
- **Key Concepts:** Overall Evaluation Criterion (OEC), Twyman's Law, Marginal Cost of Experimentation.

### Key Findings
1. **The Difficulty of Trust:** "Getting numbers is easy; getting numbers you can trust is hard." The paper emphasizes that most data interpretations are flawed due to violated assumptions.
2. **Twyman's Law:** Any figure that looks interesting or different is usually wrong. This is a critical check against false positives in innovation.
3. **Scalability:** To innovate effectively, the marginal cost of running an experiment must be lowered to near zero.
4. **OEC (Overall Evaluation Criterion):** Organizations must define a single, composite metric that aligns with long-term goals to avoid optimizing for short-term vanity metrics.

### Implications
This is the methodological "bible" for the UX strategies mentioned in Paper 4. It suggests that the "personalization" in Paper 2 and the "innovation" in Paper 5 must be rigorously tested. Without the frameworks Kohavi describes, banks risk implementing "innovations" that degrade user value.

### Limitations
- **Tech-Giant Bias:** The methods are derived from Google/Microsoft, which have massive traffic volumes. Applying these statistical significance thresholds to smaller FinTech startups or B2B banking apps with lower traffic is challenging.

### Notable Citations
- **Twyman's Law:** A key concept for data validity.

### Relevance to Your Research
**Score:** ⭐⭐⭐⭐⭐ (5/5)
**Why:** Provides the *scientific rigor* required for the research. It bridges the gap between "having an idea" (Paper 5) and "proving it works" (Paper 7).

---