Evaluation process

RomanChernenko · December 13, 2019, 10:59pm

Hello,

I have a few questions about evaluation process.

Can you please confirm that our solutions always evaluated on the same test samples? Now looks like the test sequence has shuffled at least.
How maps are choosen for visualization video? I can see different teams has different maps on the video.
I seen something strange with score progress during evaluation. The score always start from some low value and next quickly increased. And at the end I always see a big score jump. For example, I had a score 91.6% after 248 simulations. But in the end I have a score 92%. This significant jump at the end is not possible. Looks like the scoring algorithm divided done-agens sum on N+1, where N is number of finished simulations.

shivam · December 17, 2019, 1:47pm

Yes, the solutions are evaluated on same test samples but they are shuffled. https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/evaluators/service.py#L89
Video generation is done on a subset of all environments and remain same for all evaluations. It may be possible when you open the leaderboard, all videos didn’t start playing at the same time leading to this perception?
This is the place where Flatland library is generating score and N+1 thing might not be the reason. I will let @mlerik investigate & comment on it.