Will the worlds our submissions will be evaluated against be known or unknown to participants

tim_b_herbstrith · July 18, 2019, 11:55am

In Rule 9 it states that

The Agent submitted in the Entry will be evaluated against the applicable Flatland Environment generated using N random seeds unavailable to participants during the Challenge (“Seeds”). To be clear, the same N Seeds will be used to evaluate all Entries submitted in each Round, with the understanding the N Seeds in Round 1 may not be the N Seeds applied in Round 2. The Entry will be ranked on the Leaderboard based on the highest average score reached by the Agent across all Seeds (“Average Score”).
[https://www.aicrowd.com/challenges/flatland-challenge/challenge_rules/68]

Does this mean, that our agents will be evaluated on an unknown world or do only the start and target positions move?

mlerik · July 18, 2019, 2:28pm

Hi @tim_b_herbstrith

The agents will be evaluated on a “secret” test set, of different env dimensions and number of agents. This is done to ensure that the solutions generalize well. We will however, release a set of generated “Levels” that are similar to the once used for evalution.
This will happen soon so stay tuned.

marcoliver_gewaltig · July 19, 2019, 9:30am

Hi @mlerik,
Thanks for your answer, can I try to re-formulate it more specifically?
What I understand is:

Our code generates an (random) environment
We train our agents on this environment
We submit our code
You replace the random environment with a secret test environment
You run the submitted code to determine the score on the test environment

Is that more or less how it will work?
Thanks and best wishes
Marc-Oliver

mohanty · July 19, 2019, 10:44am

@marcoliver_gewaltig: Yes that is correct. In steps 4 and 5, we will be running your submitted code against a series of test environments of different levels of difficulty, and your cumulative score will be computed based on the cumulative performance of your code across all these test environments. More details about this should be released latest by this weekend.

Cheers,
Mohanty