One of the rule of the competition is to train “with only 8,000,000 samples”. How would number of samples be counted if I train something using the human dataset only? How to count, if both human dataset and the game engine are used during training?
In this competition we are talking about sampling from the environment; you can use as many of the human samples as many times as you want.
The motivation is this: it is far more feasible to collect a bunch of humans driving cars than to run \epsilon-greedy random exploration for some RL algorithm with a real car. Hence we limit the number of samples you get with the real car to 8,000,000, and encourage you to use the dataset as much as possible.
Hope this helps!
I have a question, if we use more than a state in once interaction. and the server will how to count the sample count?
e.g, likes the baselines code on Gym.step, they may used 4 states in once step refer to the agent, therefore, we could only use 8,000,000/4 = 2,000,000 step in training stage? Please help to make sure with it, Thanks.
def step(self, action):
"""Repeat action, sum reward, and max over last observations."""
total_reward = 0.0
done = None
for i in range(self._skip):
obs, reward, done, info = self.env.step(action)
if i == self._skip - 2: self._obs_buffer[0] = obs
if i == self._skip - 1: self._obs_buffer[1] = obs
total_reward += reward
if done:
break
# Note that the observation on the done=True frame
# doesn't matter
max_frame = self._obs_buffer.max(axis=0)
return max_frame, total_reward, done, info
And, I found this code in the baseline’s,
# calculate corresponding `steps` and `eval_interval` according to frameskip
# = 1440 episodes if we count an episode as 6000 frames,
# = 1080 episodes if we count an episode as 8000 frames.
maximum_frames = 8640000
if args.frame_skip is None:
steps = maximum_frames
eval_interval = 6000 * 100 # (approx.) every 100 episode (counts "1 episode = 6000 steps")
else:
steps = maximum_frames // args.frame_skip
eval_interval = 6000 * 100 // args.frame_skip # (approx.) every 100 episode (counts "1 episode = 6000 steps")
Every call to env.step() is counted as a step of the environment. It’s completely up to you how to use those steps!
Let me ask another question about MINERL_TRAINING_MAX_INSTANCES = 5 in competition_submission_starter_template. I’m not quite sure how INSTANCE is defined.
Does it mean that max 5 gym env could be run in parallel while training?
Or max 5 parallel trainer perform update function independently and somehow merge gradients periodically?
Or max 5 processes (regardless of doing what kind of job) can be deployed while training?
Thank you for all your effort.
Let me ask you one more time for clarification. If we use “FrameSkip” wrapper as provided in the baseline code, one env.step() of the wrapped env corresponds to four env.step() of the core env. In this case, does it count as four samples or just one sample?
Thanks again for your attention on this.
As you said, since frame-skipping is not built-into the environment, you will need to call env.step() multiple times. We are not able to discriminate between wrappers that cal env.step() four times in a row or once at a time; all calls to env.step() are logged and count in the global step limit.
The motivation for not including frame skipping is that we are seeking out general solutions that transfer well to other challenges in RL. As such we don’t want to punish teams for not selecting the correct number of frame-skips for their models.