Tested in the open.

We took our models and ran them on 5 years of races they had never seen during training. Here is what happened, with no cherry-picking.

Tested in the open On 5 years of races our model never saw during training
+6.0pts
More accurate than the public favorite
+21%
Theoretical return on best-value picks
5/5
Test years where the model held up

The methodology, in plain English

Most analysis products report numbers from races their model already learned on. Those numbers always look good. We don't do that.

Every January, we freeze our model on the data available through December of the prior year, then run it on every race in the new year and record the results. No re-training. No "we noticed it was missing on track X so we patched it." Just the model, the races, and the outcomes.

The numbers above are the aggregate across five of those frozen years (2021-2025). The top-pick model identifies the winner 6.0 percentage points more often than the public favorite. The best-value model returns +21% on the picks it flagged (with a 95% confidence-interval lower bound of +17%).

The kill list

Four models we shipped or planned to ship were retired in April 2026 when their fresh-data results stopped clearing our thresholds. We don't quietly fold them away; we publish their results and the date we shut them down. Email us for the current full list — we will publish it on this page in v1.1.

Try Derbee for 30 days, free.

Picks, a coach who knows your record, and a research workspace — all for one fair price.

Start 30-day free trial →
5 years of public test results 30-day free trial, full refund Cancel any time