đź’¬ Feedback & Suggestions

We are constantly trying to improve this challenge for you and would appreciate any feedback you might have! :raised_hands:

Please reply to this thread with your suggestions and feedback on making the challenge better for you!

  • What have been your major pain points so far?
  • What would you like to see improved?

All The Best!

Is the smoke test flop budget lower than the full submission budget? I’m seeing submissions run out of budget in the smoke test even though they are under the cap locally, so I suspect it may be half what it should be

Good catch, @jamespayor - confirmed and fixed. The smoke test was running under a stale hardcoded budget (1e10) instead of the real grading budget (6.8e10), so estimators spending anywhere between the two were getting wrongly rejected. The fix is live now (smoke uses the exact same per-MLP budget as grading), and we have re-graded the submissions that were affected, so they should show up scored. Thanks for flagging it!

Affected submissions that were re-evaluated: 309660, 309713, 309714, 309723, 309724, 309738, 309742, 309744, 309745.

Nice, thank you for the fix and rerunning! :slight_smile:

Returning here, it now seems that submissions (mine at least) are getting a much higher walltime penalty than before? Here’s the same code submitted yesterday (AIcrowd | ARC White-Box Estimation Challenge 2026 | Submissions #309886) vs today (AIcrowd | ARC White-Box Estimation Challenge 2026 | Submissions #309977), same flops and walltimes but much increased penalty.

If this is an intended change I’d appreciate clarification; I could be wrong but for instance I didn’t think my code is spending that much time doing things other than dispatching meaty flopscope ops, but I’m now paying a large effective compute penalty.

2 Likes

Hm also fyi on the submissions page the “final layer MSE” column seems to be populated with all-layers MSE scores: AIcrowd | ARC White-Box Estimation Challenge 2026 | Submissions

2 Likes

It seems there has been a rejudge since then, but the current leaderboard is still using pre-rejudge numbers.

1 Like

Would it be possible to publish the exact flopscope distribution that the grader uses?

I’ve hit some rejections due to the grader’s runtime environment differing from my local env. The starter kit’s pyproject constraint is flopscope>=0.5.0; this resolves to 0.5.0 in the lockfile, but the installed wheel is 0.5.0+np2.2.6, which apparently has a richer API than whatever the grader’s build uses, leading to grader errors like AttributeError: module 'flopscope' has no attribute 'as_symmetric'.

(I also noticed that plain numpy isn’t importable in the grader env—is that by design?)

1 Like