In Rule 9 it states that
The Agent submitted in the Entry will be evaluated against the applicable Flatland Environment generated using N random seeds unavailable to participants during the Challenge (“Seeds”). To be clear, the same N Seeds will be used to evaluate all Entries submitted in each Round, with the understanding the N Seeds in Round 1 may not be the N Seeds applied in Round 2. The Entry will be ranked on the Leaderboard based on the highest average score reached by the Agent across all Seeds (“Average Score”).
Does this mean, that our agents will be evaluated on an unknown world or do only the start and target positions move?
The agents will be evaluated on a “secret” test set, of different env dimensions and number of agents. This is done to ensure that the solutions generalize well. We will however, release a set of generated “Levels” that are similar to the once used for evalution.
This will happen soon so stay tuned.
Thanks for your answer, can I try to re-formulate it more specifically?
What I understand is:
- Our code generates an (random) environment
- We train our agents on this environment
- We submit our code
- You replace the random environment with a secret test environment
- You run the submitted code to determine the score on the test environment
Is that more or less how it will work?
Thanks and best wishes
@marcoliver_gewaltig: Yes that is correct. In steps 4 and 5, we will be running your submitted code against a series of test environments of different levels of difficulty, and your cumulative score will be computed based on the cumulative performance of your code across all these test environments. More details about this should be released latest by this weekend.