Hi everyone,
We have published the first release candidate of flopscope v0.8.0rc1 to PyPI, paired with whestbench v0.12.0rc0.
This is the version the evaluators will use for Phase 1. We are sharing the release candidate now so you can build against it, check your estimators, and send feedback before the final v0.8.0 release.
TL;DR
- Install the release candidate:
pip install --pre "flopscope>=0.8.0rc1" "whestbench>=0.12.0rc0"- This is the Phase 1 evaluator version.
- The cost model is now more consistent: you are billed for computation on values, not data movement.
- Contraction costs are unified across
matmul,dot,inner,outer,tensordot,vdot,einsum, and relevantlinalgoperations.- Residual wall-time accounting is fairer: framework overhead is not charged to your estimator.
- Weight packing is now pickle-free through
flops.Module.- Warm-Up evaluators are unchanged and remain on
flopscope v0.5.0/whestbench v0.10.0.
What’s New in flopscope v0.8.0
1. Computation vs data logistics
flopscope now clarifies one core principle:
You are charged for computation on values, not for moving data around.
Arithmetic, reductions, matrix multiplies, transcendentals, and FFTs cost FLOPs.
Copying, reshaping, stacking, concatenating, slicing, and gathering are free.
Matrix operations dominate many estimator budgets and are now counted in full, so please re-check your estimator against the Phase 1 budget.
2. One cost engine for contractions
matmul, dot, inner, outer, tensordot, vdot, einsum, and relevant linalg contractions now share the same symmetry-aware machinery.
This resolves cases where operations such as fnp.tensordot could previously be undercounted. These operations are now billed consistently with the consistent einsum_cost machinery.
3. Fairer residual wall-time accounting
Your score accounts for both FLOPs and residual wall time, which is time spent outside tracked operations.
We re-audited what counts as participant residual time. Framework plumbing, including data transport between the flopscope client and server and array unpacking, is now attributed to flopscope overhead rather than your residual wall time.
In short: you are charged residual time for your own code, not for evaluator plumbing.
4. Pickle-free weight packing and clearer errors
You can now bundle data with your submission through flops.Module. Loading this data is free: it costs 0 FLOPs and is not counted in residual wall time.
The new release also provides clearer errors when an operation is not available in the grading environment.
The full per-operation cost rules are documented in the cost-model.md reference. You can also use budget.summary() to inspect where your own FLOPs are going.
Packing Data Into Your Submission
Define your model as a flops.Module. Array attributes are discovered automatically, and save writes them with a small JSON config instead of using pickle.
# model.py
import flopscope as flops
import flopscope.numpy as fnp
class Linear(flops.Module):
def __init__(self, n_in, n_out):
self.W = fnp.zeros((n_out, n_in)) # array state: auto-discovered
self.b = fnp.zeros(n_out)
def config(self): # non-array config, used to rebuild
return {"n_in": self.W.shape[1], "n_out": self.W.shape[0]}
def __call__(self, x):
return fnp.einsum("oi,i->o", self.W, x) + self.b
if __name__ == "__main__":
model = Linear(8, 4)
# ...
model.save("model.npz") # named arrays + JSON config, no pickle
Load the saved module once in your estimator’s setup():
# estimator.py
from pathlib import Path
from whestbench import BaseEstimator
from model import Linear
class Estimator(BaseEstimator):
def setup(self, ctx): # runs once
self.model = Linear.from_file(Path(__file__).parent / "model.npz") # free load
def predict(self, mlp, budget):
... # use self.model
The whest CLI bundles everything in your estimator folder, up to 50 MB / 50 files, so model.py and model.npz travel with estimator.py and load for free at grading time.
whest login # once, with your AIcrowd API key
whest submit --estimator estimator.py --watch # packages, uploads, and follows to a score
Please iterate locally first:
whest run --estimator estimator.py
The full estimator and submission walkthrough is available in the starter kit. The examples/12_save_load_mlp.py example includes a multi-layer flops.Module.
Try the Release Candidate
Install the release candidate:
pip install --pre "flopscope>=0.8.0rc1" "whestbench>=0.12.0rc0"
pip show flopscope whestbench # expect 0.8.0rc1 / 0.12.0rc0
If you are using the starter kit with uv, pin the candidate with:
uv add "flopscope>=0.8.0rc1" "whestbench>=0.12.0rc0"
Then check your estimator against the budget:
import flopscope as flops
with flops.BudgetContext(flop_budget=68_000_000_000) as budget:
estimator.predict(mlp, budget=68_000_000_000)
print(f"FLOPs used: {budget.flops_used:,}")
print(budget.summary()) # per-operation breakdown
What Happens if the Cost Model Changes Again?
Phase 1 runs on this release candidate, and it is a candidate for a reason.
If community feedback leads to major changes in the final v0.8.0 release, we will re-evaluate every Phase 1 submission received up to that release on the final cost model. You can submit now without worrying that an early submission will be disadvantaged by a later evaluator change.
Send Us Feedback
Please share any issues you find with the updated cost model, flopscope in general, whestbench or the starter-kit; your feedback is extremely valuable and also counts towards the community contribution prizes of 500-5000 USD each!
-
Feedback / Discussion: please start a new topic on the challenge forum at: AIcrowd | ARC White-Box Estimation Challenge 2026 | Discussions
-
Feedback / Discussion on the Cost Model: please use the dedicated thread: Flopscope v0.8.0 Release Candidate Available!
-
Reproducible bugs: open an issue on flopscope, whestbench, or the starter kit. PRs are welcome as well.
-
Security issues: email
arc-whestbench@aicrowd.comprivately rather than opening a public issue.
To avoid any confusion: the Warm-Up evaluators are unchanged and stay on flopscope v0.5.0 / whestbench v0.10.0.
This release candidate is the evaluator version planned for Phase 1, which launches at 00:00 UTC on 18 June 2026 with an independent evaluation environment and a separate leaderboard.
Stay tuned for the official Phase 1 launch announcement, which will cover the updated target architecture, budget changes, leaderboard details, and prize structure.
All the best! ![]()