Phase 1 Launch - deeper models, and increased prizes!

:rocket: Phase 1 of the White-Box Estimation Challenge is LIVE

The Warm-Up has concluded. The Private Leaderboard will be announced soon.

Phase 1 opens at 00:00 UTC on 18 June 2026 — and it arrives with 50,000 USD in new Phase 1 prizes, bringing the total challenge prize pool to 150,000+ USD, a 4× deeper model, and a FLOP budget scaled to match.

Phase 1 has its own leaderboard, its own evaluator environment, and a substantially harder estimation target. If you placed well in the Warm-Up, you have your work cut out to maintain your position. :fire:

TL;DR

  • 50,000 USD in new Phase 1 prizes — bringing the challenge total to 150,000+ USD
  • The target MLP gets 4× deeper: 256×8 → 256×32
  • The FLOP budget grows 4× to match: 6.8 × 10¹⁰ → 2.72 × 10¹¹
  • New Phase 1 datasets are available on Hugging Face under the v1-phase1 revision
  • Even with the larger budget, submissions must stay ~15,000× smaller than the sampling budget used for the reference targets
  • New evaluator environment: flopscope v0.8.0rc1 + whestbench v0.12.0rc0
  • For details on the evaluator and cost model changes, please see the dedicated thread: Flopscope v0.8.0 Release Candidate Available for Testing

:warning: Existing participants must re-accept the updated rules before Phase 1 submissions will be accepted.


What changed from the Warm-Up?

Category Warm-Up Phase 1
Target architecture 256 × 8 256 × 32
FLOP budget 6.8 × 10¹⁰ 2.72 × 10¹¹
Dataset revision v1-warmup v1-phase1
Leaderboard Warm-Up leaderboard Phase 1 leaderboard
New prize pool 50,000 USD

:rotating_light: Rules updated

Accept the updated rules before you submit — Phase 1 submissions will not go through without it.

If you joined during the Warm-Up, head back to the challenge page and accept the latest version of the official rules, paying close attention to eligibility requirements, prize conditions, and submission requirements.


:trophy: 1. A bigger prize pool

Phase 1 adds 50,000 USD in new prizes:

Phase 1 Prize Amount
:1st_place_medal: 1st Place (score-based) 25,000 USD
:2nd_place_medal: 2nd Place (score-based) 10,000 USD
:3rd_place_medal: 3rd Place (score-based) 5,000 USD
:brain: Algorithmic Contribution Prize 10,000 USD
Total Phase 1 Prizes 50,000 USD

Together with the previously announced Phase 2 awards, the challenge now offers 150,000+ USD in total prizes:

Prize Category Phase 1 Phase 2
:1st_place_medal: 1st Place (score-based) 25,000 USD 50,000 USD
:2nd_place_medal: 2nd Place (score-based) 10,000 USD 20,000 USD
:3rd_place_medal: 3rd Place (score-based) 5,000 USD 10,000 USD
:brain: Algorithmic Contribution Prize 10,000 USD 20,000 USD
Phase Total 50,000 USD 100,000 USD

Total Challenge Prize Pool: 150,000+ USD

To be eligible for an Algorithmic Contribution Prize, each prize-eligible participant must submit a written description of their solution detailed enough for an independent practitioner to understand and reproduce the material aspects of the prize-determining results.

Full prize and eligibility details are on the challenge page and in the official rules.


:brain: 2. A deeper model

The Warm-Up targeted an MLP with 256 width and 8 layers. Phase 1 keeps the same width but increases the depth by 4×:

Warm-Up:  256 width × 8 layers
Phase 1:  256 width × 32 layers

More depth means more structure to recover and a genuinely tougher estimation problem. This is the jump that defines Phase 1.


:zap: 3. A larger FLOPs budget

The FLOP budget grows to:

Warm-Up:  6.8 × 10¹⁰ FLOPs / MLP
Phase 1:  2.72 × 10¹¹ FLOPs / MLP

:gear: 4. Updated evaluator environment

Phase 1 runs on a new, independent evaluation environment with its own leaderboard:

flopscope v0.8.0rc1
whestbench v0.12.0rc0

What’s new under the hood:

  • More consistent FLOP accounting
  • Unified contraction costing across matrix operations
  • Fairer residual wall-time accounting
  • Packaging data through flops.Module in your submissions

For the full breakdown of evaluator behavior, migration guidance, and cost model changes, see the dedicated flopscope v0.8.0 release thread:


:package: 5. Updated Phase 1 datasets available

The public datasets for Phase 1 are now available on Hugging Face:

Dataset repository:

Phase 1 revision:
v1-phase1

This revision contains:

Split Number of MLPs Architecture
Full split 1,000 MLPs 256 width × 32 layers
Mini split 100 MLPs 256 width × 32 layers

When downloading or referencing the dataset programmatically, make sure to use the v1-phase1 revision explicitly.

For example:

hf://aicrowd/arc-whestbench-public-2026@v1-phase1

This is the Phase 1 dataset revision and should be used instead of any locally cached Warm-Up dataset revision.


:white_check_mark: Submit early with confidence

Phase 1 launches on the current flopscope v0.8.0rc1 and whestbench v0.12.0rc0 evaluator environment.

If community feedback leads to major changes before the final evaluator release, every submission made in Phase 1 until the change, will be re-evaluated on the updated cost model to ensure the scores on the leaderboard stay comparable.


:speech_balloon: Send us feedback

Please share any issues you find with the updated cost model, flopscope, whestbench, or the starter kit. Your feedback is extremely valuable and may also count toward the community contribution prizes of 500–5,000 USD each.


1,000 new Phase 1 models. 4× deeper networks. 50,000 USD in additional cash prizes.

Phase 1 is now live. Good luck. :rocket:

All the best!