Phase 1 of the White-Box Estimation Challenge is LIVE
The Warm-Up has concluded. The Private Leaderboard will be announced soon.
Phase 1 opens at 00:00 UTC on 18 June 2026 — and it arrives with 50,000 USD in new Phase 1 prizes, bringing the total challenge prize pool to 150,000+ USD, a 4× deeper model, and a FLOP budget scaled to match.
Phase 1 has its own leaderboard, its own evaluator environment, and a substantially harder estimation target. If you placed well in the Warm-Up, you have your work cut out to maintain your position. ![]()
TL;DR
- 50,000 USD in new Phase 1 prizes — bringing the challenge total to 150,000+ USD
- The target MLP gets 4× deeper: 256×8 → 256×32
- The FLOP budget grows 4× to match: 6.8 × 10¹⁰ → 2.72 × 10¹¹
- New Phase 1 datasets are available on Hugging Face under the
v1-phase1revision- Even with the larger budget, submissions must stay ~15,000× smaller than the sampling budget used for the reference targets
- New evaluator environment: flopscope v0.8.0rc1 + whestbench v0.12.0rc0
- For details on the evaluator and cost model changes, please see the dedicated thread: Flopscope v0.8.0 Release Candidate Available for Testing
Existing participants must re-accept the updated rules before Phase 1 submissions will be accepted.
What changed from the Warm-Up?
| Category | Warm-Up | Phase 1 |
|---|---|---|
| Target architecture | 256 × 8 |
256 × 32 |
| FLOP budget | 6.8 × 10¹⁰ |
2.72 × 10¹¹ |
| Dataset revision | v1-warmup |
v1-phase1 |
| Leaderboard | Warm-Up leaderboard | Phase 1 leaderboard |
| New prize pool | — | 50,000 USD |
Rules updated
Accept the updated rules before you submit — Phase 1 submissions will not go through without it.
If you joined during the Warm-Up, head back to the challenge page and accept the latest version of the official rules, paying close attention to eligibility requirements, prize conditions, and submission requirements.
1. A bigger prize pool
Phase 1 adds 50,000 USD in new prizes:
| Phase 1 Prize | Amount |
|---|---|
|
|
25,000 USD |
|
|
10,000 USD |
|
|
5,000 USD |
|
|
10,000 USD |
| Total Phase 1 Prizes | 50,000 USD |
Together with the previously announced Phase 2 awards, the challenge now offers 150,000+ USD in total prizes:
| Prize Category | Phase 1 | Phase 2 |
|---|---|---|
|
|
25,000 USD | 50,000 USD |
|
|
10,000 USD | 20,000 USD |
|
|
5,000 USD | 10,000 USD |
|
|
10,000 USD | 20,000 USD |
| Phase Total | 50,000 USD | 100,000 USD |
Total Challenge Prize Pool: 150,000+ USD
To be eligible for an Algorithmic Contribution Prize, each prize-eligible participant must submit a written description of their solution detailed enough for an independent practitioner to understand and reproduce the material aspects of the prize-determining results.
Full prize and eligibility details are on the challenge page and in the official rules.
2. A deeper model
The Warm-Up targeted an MLP with 256 width and 8 layers. Phase 1 keeps the same width but increases the depth by 4×:
Warm-Up: 256 width × 8 layers
Phase 1: 256 width × 32 layers
More depth means more structure to recover and a genuinely tougher estimation problem. This is the jump that defines Phase 1.
3. A larger FLOPs budget
The FLOP budget grows to:
Warm-Up: 6.8 × 10¹⁰ FLOPs / MLP
Phase 1: 2.72 × 10¹¹ FLOPs / MLP
4. Updated evaluator environment
Phase 1 runs on a new, independent evaluation environment with its own leaderboard:
flopscope v0.8.0rc1
whestbench v0.12.0rc0
What’s new under the hood:
- More consistent FLOP accounting
- Unified contraction costing across matrix operations
- Fairer residual wall-time accounting
- Packaging data through
flops.Modulein your submissions
For the full breakdown of evaluator behavior, migration guidance, and cost model changes, see the dedicated flopscope v0.8.0 release thread:
5. Updated Phase 1 datasets available
The public datasets for Phase 1 are now available on Hugging Face:
Dataset repository:
Phase 1 revision:
v1-phase1
This revision contains:
| Split | Number of MLPs | Architecture |
|---|---|---|
| Full split | 1,000 MLPs | 256 width × 32 layers |
| Mini split | 100 MLPs | 256 width × 32 layers |
When downloading or referencing the dataset programmatically, make sure to use the v1-phase1 revision explicitly.
For example:
hf://aicrowd/arc-whestbench-public-2026@v1-phase1
This is the Phase 1 dataset revision and should be used instead of any locally cached Warm-Up dataset revision.
Submit early with confidence
Phase 1 launches on the current flopscope v0.8.0rc1 and whestbench v0.12.0rc0 evaluator environment.
If community feedback leads to major changes before the final evaluator release, every submission made in Phase 1 until the change, will be re-evaluated on the updated cost model to ensure the scores on the leaderboard stay comparable.
Send us feedback
Please share any issues you find with the updated cost model, flopscope, whestbench, or the starter kit. Your feedback is extremely valuable and may also count toward the community contribution prizes of 500–5,000 USD each.
-
General feedback / discussion: start a new topic on the challenge forum:
AIcrowd | ARC White-Box Estimation Challenge 2026 | Discussions -
Feedback / discussion on the cost model: use the dedicated flopscope v0.8.0 release thread:
Flopscope v0.8.0 Release Candidate Available! -
Reproducible bugs: open an issue on flopscope, whestbench, or the starter kit.
-
Security issues: email
arc-whestbench@aicrowd.comprivately rather than opening a public issue.
1,000 new Phase 1 models. 4× deeper networks. 50,000 USD in additional cash prizes.
Phase 1 is now live. Good luck. ![]()
All the best!