I dont quite understand having such a restrictive time limit in place for rollout since for one thing usually longer rollout correlates with a better policy in those games other things being equal this time limit seems penalizing better agent…
Yes! Same problem here. All of my submissions this round have had evaluation timeouts. Glad to know I’m not the only one. Would love a more relaxed time limit.
I think the rollout setup is pretty suboptimal as well. I guess its using same number of parallel envs and workers as training, but typically a lot of memory goes for the training which is pretty much not used for inference with the current setup, the inference code should be allowed to be optimized for 8 workers and as many parallel envs that maximizes throughput.
Hi everyone,
We are acknowledging this issue. And after discussing internally, we will post an update later today.
But overall, we would be okay with relaxing (a little bit) the timeouts for the rollouts in this and subsequent rounds.
Cheers,
Mohanty
Hi Mohanty, Thanks for the followup. It is great to hear that the rollout timeouts will be relaxed. Look forward to finally being able to finish the Plunder during rollout!
Same problem here! And sometimes on starpilot too.
@mohanty - alternatively, you could decrease the number of episodes during evaluation to 500. I think that would still be ok in terms of accuracy.
In my testing actually, there is high variance of result at 1000 runs but reduces at 5000 … and I still think rollouts can be much faster with more optimized code for parallel envs.
Hi @mohanty it has been two days any updates on this? In my case it has been quite frustrated with empty training log (tag the issues a few times but got no response) and no rollout scores even though training and most of rollouts complete successfully except for one or two games (usually Plunder and sometimes hovercraft). I have no clues on how well the successfully finished games are doing in rollouts (not sure why we do not disclose the score for each rollout). And I could not tell for those timeout games how many episodes out of 1000 are completed which could be useful for debugging.
Hello @xiaocheng_tang
The score for the submission will be calculated only if all the rollouts succeed. Even if one of them fails, the score won’t be reported.
For per-env-scores, the mean reward you get during rollouts, generally, is similar/close to the mean reward that is obtained at the end of training (unless some reward shaping is done during training).
We can’t expose the logs for rollouts, however, we can provide the number of episodes finished in the given time limit.
We will increase the timelimit for rollouts. We will make an announcement as soon as the new limits are live.
@jyotish Did I miss the announcement on the new time-limit for rollouts? We noticed that the better agent playing Plunder the longer is the average length of episode. It may even reach over 1K. When that happens the rollouts fail
Hello @graa
I was under the impression that the post was made, but it looks like I forgot to post it . The new time limit for rollouts is 5700s (previously 1800s) and this change has been active since September 30th.
Yes, you are right about the agent’s behaviour in plunder. Same would be the base for bigfish where a better agent will take a longer time to complete the episode.
Can one hope that the limit will be increase even more for these last days? At least for the final evaluation.
Hi, @jyotish,
Our best policy on plunder takes 900 steps on average per episode (which is a lot, but allowed by environment). If you are running rollouts for 1000 episodes, that’s 900000 timesteps that have to be run by the rollout worker. If the timeout is 5700s, that means the agent has to run at least 157 ts/s (900000/5700) to finish in time.
Requiring that model runs at least 157 ts/s on a single CPU is a pretty tight constraint, which is nowhere in the original rules. This would effectively put a constraint on the size of the model, and also discourages the use of PyTorch, which is slower on a single CPU worker (even in the baseline implementation) than Tensorflow.
We believe this constraint should be removed for the final evaluation, and there should be effectively no time limit for the rollouts.