How many episodes can an agent play within the 4 days training period?
I am asking because some RL methods need many episodes to improve the
policy. We try to calculate how many samples can be generated for
Q-learning (e.g. DQN), policy gradient, or SARSA.
Thanks for organizing this amazing challenge