The Factor Zoo & Replication Crisis

With 400+ published factors, most are likely false discoveries.

Multiple testing problemData snoopingP-hacking in financeReplication crisisAnomaly decay

Typical IS Sharpe

0.3 – 0.9 (varies widely)

Typical OOS Sharpe

−0.1 – 0.3 (post-publication average)

Capacity

Small-cap

Signal decay

~12m half-life

High turnover

Overview

Cochrane (2011) famously asked "which factors matter?" in his AFA Presidential Address, coining the "factor zoo" problem. Harvey, Liu, and Zhu (2016) documented 316 published factors by 2012 and argued that standard statistical significance thresholds (t > 2.0) are far too permissive given the scale of multiple testing. They proposed a t-statistic hurdle of 3.0 or higher for factor claims to survive a Bonferroni-corrected multiple testing framework. McLean and Pontiff (2016) showed that anomaly returns decay by 58% after publication — consistent with both rational learning and data-mining deterioration. Hou, Xue, and Zhang (2020) attempted to replicate 452 anomalies and found only 85 significant at conventional thresholds.

Economic Intuition

The problem is fundamental to any empirical science that runs many regressions on the same dataset. Given enough variables and enough researchers, some combination will look significant by chance. In finance, the situation is especially acute because: (1) financial data is relatively short (70 years of reliable US data), (2) researchers share the same datasets (CRSP, Compustat), (3) publication bias favors positive results, and (4) t-statistics are often artificially inflated by data-mining procedures that were not fully disclosed. The result is that many "discoveries" reflect sample-specific noise rather than genuine risk premia.

Out-of-Sample Evidence

Weak OOS survival

This is the core theme of ConvexPi. The platform is built around one question: does your strategy survive out-of-sample? The factor zoo literature shows that most do not. The right mental model: treat every in-sample result as a hypothesis, not a fact. The OOS Sharpe ratio on fresh data is the only credible evidence of real alpha. Strategies that use more parameters, more indicators, and longer lookback periods are more susceptible to the multiple testing problem — even if each individual test looks conservative. Simplicity is a form of robustness.