Recently, while playing the Mario game, I observed that LLMs struggle with physical simulations, especially when processing non-periodic screen data conveyed through text.
To address these limitations, I considered using Python code to compute specific status recognitions (e.g., distance calculations or collision detection) and then pass refined data to the LLM. This hybrid approach could simplify problem-solving by leveraging Python’s speed and precision for mathematical computations, while the LLM focuses on strategic decisions.
I wonder if this role distribution would improve competition’s qualitative scores, or if handling all tasks—even those challenging for LLMs—solely through the LLM is more beneficial.
(1) Outsourcing tasks that the LLM struggles V.S. (2) solving everything with the LLM
And I believe direction (2) is more suitable for this contest.
2 Likes
I also have some confusion about this. This is an agent competition. Logically speaking, agents are supposed to be able to use tools. But this might very likely turn this competition into a mere tool-based event.
I expect the authorities to provide an answer.
I have a similar concern.
Some tools or rule-based components are probably unavoidable, especially for handling environment details. But I think it’s important to be clear about their role.
Using tools is fine as long as the LLM is the one making decisions and choosing when to use them. What feels less appropriate is a rule-based system that already decides everything and only uses the LLM as a wrapper.
If the LLM drives the decision-making and tools just provide information, it still feels like an agent.
Solving everything with LLM is very insufficient in some areas (e.g., mathematics).
Developing an agent that can play games very well using only LLM is a bad decision I believe.
As I understand, their ORAK research paper aims to apply LLM into games to explore the potential of LLM as a good player, and I think this contest is an extension of their research.
So, I believe they do not want us to use external tools except for LLM.
However, since there is no information about this criterion, I hope the contest owners clarify the correct directions.
Hi all,
Thank you for your enthusiasm—and for the great question.
As you correctly understood, since this is a competition for LLM agents, the use of tools is not prohibited under the rules. However, tool usage may be considered cheating if it explicitly exploits a known solution to the games—for example, using web search to retrieve established solutions.
We recognize that there may be some ambiguity around specific aspects of the competition rules. To address this, we are planning to host a town hall session in the coming days to provide clarification and answer questions.
We hope this helps, and thank you for your engagement.