Phase 1 Launch - deeper models, and increased prizes!

mohanty · June 17, 2026, 6:35pm

Phase 1 of the White-Box Estimation Challenge is LIVE

The Warm-Up has concluded. The Private Leaderboard will be announced soon.

Phase 1 opens at 00:00 UTC on 18 June 2026 — and it arrives with 50,000 USD in new Phase 1 prizes, bringing the total challenge prize pool to 150,000+ USD, a 4× deeper model, and a FLOP budget scaled to match.

Phase 1 has its own leaderboard, its own evaluator environment, and a substantially harder estimation target. If you placed well in the Warm-Up, you have your work cut out to maintain your position.

TL;DR

50,000 USD in new Phase 1 prizes — bringing the challenge total to 150,000+ USD

The target MLP gets 4× deeper: 256×8 → 256×32

The FLOP budget grows 4× to match: 6.8 × 10¹⁰ → 2.72 × 10¹¹

New Phase 1 datasets are available on Hugging Face under the v1-phase1 revision

Even with the larger budget, submissions must stay ~15,000× smaller than the sampling budget used for the reference targets

New evaluator environment: flopscope v0.8.0rc1 + whestbench v0.12.0rc0

For details on the evaluator and cost model changes, please see the dedicated thread: Flopscope v0.8.0 Release Candidate Available for Testing

Existing participants must re-accept the updated rules before Phase 1 submissions will be accepted.

What changed from the Warm-Up?

Category	Warm-Up	Phase 1
Target architecture	`256 × 8`	`256 × 32`
FLOP budget	`6.8 × 10¹⁰`	`2.72 × 10¹¹`
Dataset revision	`v1-warmup`	`v1-phase1`
Leaderboard	Warm-Up leaderboard	Phase 1 leaderboard
New prize pool	—	50,000 USD

Rules updated

Accept the updated rules before you submit — Phase 1 submissions will not go through without it.

If you joined during the Warm-Up, head back to the challenge page and accept the latest version of the official rules, paying close attention to eligibility requirements, prize conditions, and submission requirements.

1. A bigger prize pool

Phase 1 adds 50,000 USD in new prizes:

Phase 1 Prize	Amount
1st Place (score-based)	25,000 USD
2nd Place (score-based)	10,000 USD
3rd Place (score-based)	5,000 USD
Algorithmic Contribution Prize	10,000 USD
Total Phase 1 Prizes	50,000 USD

Together with the previously announced Phase 2 awards, the challenge now offers 150,000+ USD in total prizes:

Prize Category	Phase 1	Phase 2
1st Place (score-based)	25,000 USD	50,000 USD
2nd Place (score-based)	10,000 USD	20,000 USD
3rd Place (score-based)	5,000 USD	10,000 USD
Algorithmic Contribution Prize	10,000 USD	20,000 USD
Phase Total	50,000 USD	100,000 USD

Total Challenge Prize Pool: 150,000+ USD

To be eligible for an Algorithmic Contribution Prize, each prize-eligible participant must submit a written description of their solution detailed enough for an independent practitioner to understand and reproduce the material aspects of the prize-determining results.

Full prize and eligibility details are on the challenge page and in the official rules.

2. A deeper model

The Warm-Up targeted an MLP with 256 width and 8 layers. Phase 1 keeps the same width but increases the depth by 4×:

Warm-Up:  256 width × 8 layers
Phase 1:  256 width × 32 layers

More depth means more structure to recover and a genuinely tougher estimation problem. This is the jump that defines Phase 1.

3. A larger FLOPs budget

The FLOP budget grows to:

Warm-Up:  6.8 × 10¹⁰ FLOPs / MLP
Phase 1:  2.72 × 10¹¹ FLOPs / MLP

4. Updated evaluator environment

Phase 1 runs on a new, independent evaluation environment with its own leaderboard:

flopscope v0.8.0rc1
whestbench v0.12.0rc0

What’s new under the hood:

More consistent FLOP accounting
Unified contraction costing across matrix operations
Fairer residual wall-time accounting
Packaging data through flops.Module in your submissions

For the full breakdown of evaluator behavior, migration guidance, and cost model changes, see the dedicated flopscope v0.8.0 release thread:

5. Updated Phase 1 datasets available

The public datasets for Phase 1 are now available on Hugging Face:

Dataset repository:

Phase 1 revision:
v1-phase1

This revision contains:

Split	Number of MLPs	Architecture
Full split	1,000 MLPs	`256 width × 32 layers`
Mini split	100 MLPs	`256 width × 32 layers`

When downloading or referencing the dataset programmatically, make sure to use the v1-phase1 revision explicitly.

For example:

hf://aicrowd/arc-whestbench-public-2026@v1-phase1

This is the Phase 1 dataset revision and should be used instead of any locally cached Warm-Up dataset revision.

Submit early with confidence

Phase 1 launches on the current flopscope v0.8.0rc1 and whestbench v0.12.0rc0 evaluator environment.

If community feedback leads to major changes before the final evaluator release, every submission made in Phase 1 until the change, will be re-evaluated on the updated cost model to ensure the scores on the leaderboard stay comparable.

Send us feedback

Please share any issues you find with the updated cost model, flopscope, whestbench, or the starter kit. Your feedback is extremely valuable and may also count toward the community contribution prizes of 500–5,000 USD each.

General feedback / discussion: start a new topic on the challenge forum:
AIcrowd | ARC White-Box Estimation Challenge 2026 | Discussions
Feedback / discussion on the cost model: use the dedicated flopscope v0.8.0 release thread:
Flopscope v0.8.0 Release Candidate Available!
Reproducible bugs: open an issue on flopscope, whestbench, or the starter kit.
Security issues: email arc-whestbench@aicrowd.com privately rather than opening a public issue.

1,000 new Phase 1 models. 4× deeper networks. 50,000 USD in additional cash prizes.

Phase 1 is now live. Good luck.

All the best!

snehananavati · June 17, 2026, 6:35pm

mohanty · June 19, 2026, 3:29am