Anyone else having trouble with Plunder rollout timeout?

xiaocheng_tang · September 27, 2020, 6:58am

I dont quite understand having such a restrictive time limit in place for rollout since for one thing usually longer rollout correlates with a better policy in those games other things being equal this time limit seems penalizing better agent…

tim_whitaker · September 27, 2020, 3:09pm

Yes! Same problem here. All of my submissions this round have had evaluation timeouts. Glad to know I’m not the only one. Would love a more relaxed time limit.

dipam_chakraborty · September 28, 2020, 1:10am

I think the rollout setup is pretty suboptimal as well. I guess its using same number of parallel envs and workers as training, but typically a lot of memory goes for the training which is pretty much not used for inference with the current setup, the inference code should be allowed to be optimized for 8 workers and as many parallel envs that maximizes throughput.

mohanty · September 28, 2020, 5:22am

Hi everyone,

We are acknowledging this issue. And after discussing internally, we will post an update later today.

But overall, we would be okay with relaxing (a little bit) the timeouts for the rollouts in this and subsequent rounds.

Cheers,
Mohanty

xiaocheng_tang · September 29, 2020, 4:01am

Hi Mohanty, Thanks for the followup. It is great to hear that the rollout timeouts will be relaxed. Look forward to finally being able to finish the Plunder during rollout!

jurgisp · September 29, 2020, 12:29pm

Same problem here! And sometimes on starpilot too.

jurgisp · September 29, 2020, 3:34pm

@mohanty - alternatively, you could decrease the number of episodes during evaluation to 500. I think that would still be ok in terms of accuracy.

dipam_chakraborty · September 29, 2020, 5:07pm

In my testing actually, there is high variance of result at 1000 runs but reduces at 5000 … and I still think rollouts can be much faster with more optimized code for parallel envs.

xiaocheng_tang · September 29, 2020, 5:56pm

Hi @mohanty it has been two days any updates on this? In my case it has been quite frustrated with empty training log (tag the issues a few times but got no response) and no rollout scores even though training and most of rollouts complete successfully except for one or two games (usually Plunder and sometimes hovercraft). I have no clues on how well the successfully finished games are doing in rollouts (not sure why we do not disclose the score for each rollout). And I could not tell for those timeout games how many episodes out of 1000 are completed which could be useful for debugging.

jyotish · September 29, 2020, 6:26pm

Hello @xiaocheng_tang

The score for the submission will be calculated only if all the rollouts succeed. Even if one of them fails, the score won’t be reported.

For per-env-scores, the mean reward you get during rollouts, generally, is similar/close to the mean reward that is obtained at the end of training (unless some reward shaping is done during training).

We can’t expose the logs for rollouts, however, we can provide the number of episodes finished in the given time limit.

We will increase the timelimit for rollouts. We will make an announcement as soon as the new limits are live.

graa · October 30, 2020, 4:19pm

@jyotish Did I miss the announcement on the new time-limit for rollouts? We noticed that the better agent playing Plunder the longer is the average length of episode. It may even reach over 1K. When that happens the rollouts fail

jyotish · October 30, 2020, 5:39pm

Hello @graa

I was under the impression that the post was made, but it looks like I forgot to post it . The new time limit for rollouts is 5700s (previously 1800s) and this change has been active since September 30th.

Yes, you are right about the agent’s behaviour in plunder. Same would be the base for bigfish where a better agent will take a longer time to complete the episode.

graa · October 30, 2020, 8:11pm

Can one hope that the limit will be increase even more for these last days? At least for the final evaluation.

jurgisp · November 1, 2020, 8:40am

Hi, @jyotish,

Our best policy on plunder takes 900 steps on average per episode (which is a lot, but allowed by environment). If you are running rollouts for 1000 episodes, that’s 900000 timesteps that have to be run by the rollout worker. If the timeout is 5700s, that means the agent has to run at least 157 ts/s (900000/5700) to finish in time.

Requiring that model runs at least 157 ts/s on a single CPU is a pretty tight constraint, which is nowhere in the original rules. This would effectively put a constraint on the size of the model, and also discourages the use of PyTorch, which is slower on a single CPU worker (even in the baseline implementation) than Tensorflow.

We believe this constraint should be removed for the final evaluation, and there should be effectively no time limit for the rollouts.