Announcement: Clarifications on Prompts, Fine-tuning, Tool Usage, and Hidden Test Cases

We would like to share clarifications and updates regarding prompts, fine-tuning datasets, tool usage, and hidden test cases. Some points below clarify or update discussions from the Town Hall.

1) Prompts and Fine-tuning Datasets

There are no restrictions on prompts or fine-tuning datasets.

  • Based on participant feedback, we recognize that some prompt-related constraints discussed during the Town Hall caused confusion and appeared inconsistent with the original competition rules.
  • To address this, we confirm that no additional constraints are imposed on prompt design or fine-tuning datasets.
  • Participants are free to design prompts and fine-tune their models in any manner they choose.

2) Tool Call Usage

Only a calculator tool is allowed.

  • Tool calls are restricted to the use of a calculator only.
  • The use of external tools such as web search, external APIs, or other third-party services is not permitted.
  • RAG (retrieval-augmented generation) and internal memory mechanisms are considered part of the agent’s internal architecture and are not classified as tool calls. Their use is allowed.

3) Hidden Test Cases and Game-specific Details

This section clarifies how hidden test cases may differ from the live evaluation environments.

2048

  • The board size may be extended to an arbitrary N×M grid.
  • This corrects an earlier response from the Town Hall Q&A. We appreciate your understanding.

Super Mario

  • The map layout may change in hidden test cases.

StarCraft II (SC2)

  • One or more of the following may change in hidden test cases:

    • bot_race
    • bot_difficulty
    • bot_build
    • map_idx

Pokémon

  • The existing seven milestones will not change.

  • However, the following aspects may vary:

    • Map state coordinates
    • Names of Pokémon, NPCs, and skills
  • Step definition

    • One agent action corresponds to one step.
    • The maximum number of steps remains 200.
  • Action constraints

    • Creating new high-level actions beyond the provided predefined functions is not allowed.

We hope these clarifications reduce ambiguity and help participants focus on building robust and generalizable agents.
For further questions, please use the discussion channels.

Thank you for your participation.