Selecting seeds during training

Leckofunny · June 25, 2020, 8:46am

Hi!

How are you enforcing the usage of 200 training seeds once submitted?
I’m planning on a submission that has some logics to sample certain seeds for each environment.
And as far as Procgen is implemented, I’d have to close and instantiate again the environment to apply the designated seed.

Leckofunny · July 1, 2020, 7:28am

Any info about this @mohanty ?

mohanty · July 1, 2020, 10:32am

Hi @Leckofunny,

So in the warm-up round and round-1, we will not be evaluating generalization. So during the training phase your code will have access to all the levels.

In round-2, we will be restricting access to only 200 levels during the training phase by enforcing num_levels=200 during all env instantiation.

Does this answer your question ?.

Cheers,
Mohanty

Leckofunny · July 1, 2020, 1:22pm

@mohanty
I’d like to explicitly set a distinct seed for each worker during training, because I’ve got a concept for sampling seeds.
The implementation would probably look similar to this:
https://docs.ray.io/en/master/rllib-training.html#curriculum-learning

As far as I know, the Procgen environment has to be closed and instantiated again to apply a distinct seed (num_leves = 1, start_level = my_desired_seed), because I cannot enforce a new seed during the reset() call.

So I assume that 200 seeds will be sampled uniformly and it will not be possible to inject my logic to alter the sampling strategy of the 200 seeds.

Leckofunny · July 5, 2020, 5:07pm

I guess my assumption is correct since nobody negates it.
It is a pity that Curriculum Learning cannot be done during this challenge.

tim_whitaker · July 5, 2020, 8:47pm

Agreed. I wanted to try curriculum learning (and a few other ideas) but basically wrote them off as not possible since we can’t change any of the environment wrapper code. Would appreciate it if anyone found a workaround.

jyotish · July 6, 2020, 6:37am

Hello @Leckofunny

This should be possible. We do support callbacks so approach 2 mentioned in definitely doable. For the first approach, did you try passing it as a custom algorithm? According to this line,

github.com

ray-project/ray/blob/master/python/ray/tune/tune.py#L106


    resume=False,
    queue_trials=False,
    reuse_actors=False,
    trial_executor=None,
    raise_on_failed_trial=True,
    return_trials=False,
    ray_auto_init=True):
"""Executes training.

Args:
    run_or_experiment (function | class | str | :class:`Experiment`): If
        function|class|str, this is the algorithm or model to train.
        This may refer to the name of a built-on algorithm
        (e.g. RLLib's DQN or PPO), a user-defined trainable
        function or class, or the string identifier of a
        trainable function or class registered in the tune registry.
        If Experiment, then Tune will execute training based on
        Experiment.spec. If you want to pass in a Python lambda, you
        will need to first register the function:
        ``tune.register_trainable("lambda_id", lambda x: ...)``. You can
        then use ``tune.run("lambda_id")``.

You should be able to pass the train function in the approach 1 as a custom algorithm. Can you give it a try? If that doesn’t work, you should be able to extend the existing PPOTrainer class as custom algorithm.

For the re-init part, can you try running env.close() followed by env.__init__() with parameters from the current env? I’m not sure if this is really the right way. I’ll get back in case I find a better solution.

tim_whitaker · July 8, 2020, 11:33pm

Just to add a bit more discussion here. I thought a little bit more about curriculum learning and perhaps it’s a bit against the spirit of the competition. For round 1 it doesn’t matter, but when we get into the final round where num_levels=200, curriculum learning just seems like a way to skirt that rule by having more levels to work with. This would only work if you’re careful to only allot x numbers of levels for easy and 200-x levels for the hard distribution. Just something to keep in mind if anyone else wants to explore this idea.

Leckofunny · July 9, 2020, 8:41am

My auto curriculum algorithm just alters the way seeds are sampled to provide much more useful data to the agent and hence improve sample efficiency. Having more seeds than 100 or 200 doesn’t even help in my opinion.