Tested in the open.
We took our models and ran them on 5 years of races they had never seen during training. Here is what happened, with no cherry-picking.
The methodology, in plain English
Most analysis products report numbers from races their model already learned on. Those numbers always look good. We don't do that.
Every January, we freeze our model on the data available through December of the prior year, then run it on every race in the new year and record the results. No re-training. No "we noticed it was missing on track X so we patched it." Just the model, the races, and the outcomes.
The numbers above are the aggregate across five of those frozen years (2021-2025). The top-pick model identifies the winner 6.0 percentage points more often than the public favorite. The best-value model returns +21% on the picks it flagged (with a 95% confidence-interval lower bound of +17%).
The kill list
Four models we shipped or planned to ship were retired in April 2026 when their fresh-data results stopped clearing our thresholds. We don't quietly fold them away; we publish their results and the date we shut them down. Email us for the current full list — we will publish it on this page in v1.1.
Try Derbee for 30 days, free.
Picks, a coach who knows your record, and a research workspace — all for one fair price.
Start 30-day free trial →