Timeout in submission

antoinep · July 24, 2020, 9:29am

Hello, yesterday I uploaded a submission using a RL model trained with the multi-agent example script (multi_agent.py). The evaluation started and I saw the progression but it was very slow. Finally, the evaluation exit with a timeout error (“Timeout : Evaluation took too long.”)

When I run the local evaluator with redis, I see that my controller, for example on a test with 200 agents, make the decisions in 0.1s when the environment actualisation took 0.60s. So the model don’t seem to be the problem here? What can I do about that?

Thanks

MasterScrat · July 24, 2020, 9:36am

Hello @antoinep, indeed the environment is slow which is a problem for many submissions, especially the RL ones.

We are working on different solutions and will make sure this is handled better in Round 2.

For now, the most efficient solution would be to “pick your battles”. If your solution is too slow to solve all 400 episodes, you can chose to only solve some of them.

While there’s no way to “skip” episodes, what you can do is perform “no-ops” during some of the episodes. If you perform steps with no actions for the whole episode (ie env.step({})), you will very quickly reach the end of that episode. Of course you will get a score of -1.0 for this episode, but this will allow you to finish the evaluation in time.

For example, you could start by only using your RL policy for environments with 50 agents or less (you can see the environment configurations here). For all other environments, you just perform no-ops until they’re over. If you see your solution is fast enough this way, then you can tackle more environments eg up to 80 agents.

There are other ways to speed up your policy, eg running the inference in parallel, keeping a cache of {state -> action} etc, but skipping some episodes will let you make a successful submission more easily in any case.

jiaxun_cui · July 26, 2020, 2:51pm

Hey,
I am wondering how to get number of agents before env_create?

MasterScrat · July 26, 2020, 10:36pm

Hey @jiaxun_cui, this is not currently possible.