Sample efficiency and human dataset


One of the rule of the competition is to train “with only 8,000,000 samples”. How would number of samples be counted if I train something using the human dataset only? How to count, if both human dataset and the game engine are used during training?


In this competition we are talking about sampling from the environment; you can use as many of the human samples as many times as you want.

The motivation is this: it is far more feasible to collect a bunch of humans driving cars than to run \epsilon-greedy random exploration for some RL algorithm with a real car. Hence we limit the number of samples you get with the real car to 8,000,000, and encourage you to use the dataset as much as possible.

Hope this helps!

1 Like

I have a question, if we use more than a state in once interaction. and the server will how to count the sample count?

e.g, likes the baselines code on Gym.step, they may used 4 states in once step refer to the agent, therefore, we could only use 8,000,000/4 = 2,000,000 step in training stage? Please help to make sure with it, Thanks.

    def step(self, action):
        """Repeat action, sum reward, and max over last observations."""
        total_reward = 0.0
        done = None
        for i in range(self._skip):
            obs, reward, done, info = self.env.step(action)
            if i == self._skip - 2: self._obs_buffer[0] = obs
            if i == self._skip - 1: self._obs_buffer[1] = obs
            total_reward += reward
            if done:
        # Note that the observation on the done=True frame
        # doesn't matter
        max_frame = self._obs_buffer.max(axis=0)

        return max_frame, total_reward, done, info

And, I found this code in the baseline’s,

 # calculate corresponding `steps` and `eval_interval` according to frameskip
    # = 1440 episodes if we count an episode as 6000 frames,
    # = 1080 episodes if we count an episode as 8000 frames.
    maximum_frames = 8640000
    if args.frame_skip is None:
        steps = maximum_frames
        eval_interval = 6000 * 100  # (approx.) every 100 episode (counts "1 episode = 6000 steps")
        steps = maximum_frames // args.frame_skip
        eval_interval = 6000 * 100 // args.frame_skip  # (approx.) every 100 episode (counts "1 episode = 6000 steps")

Every call to env.step() is counted as a step of the environment. It’s completely up to you how to use those steps!


Let me ask another question about MINERL_TRAINING_MAX_INSTANCES = 5 in competition_submission_starter_template. I’m not quite sure how INSTANCE is defined.
Does it mean that max 5 gym env could be run in parallel while training?
Or max 5 parallel trainer perform update function independently and somehow merge gradients periodically?
Or max 5 processes (regardless of doing what kind of job) can be deployed while training?