Selecting seeds during training

Hi!

How are you enforcing the usage of 200 training seeds once submitted?
I’m planning on a submission that has some logics to sample certain seeds for each environment.
And as far as Procgen is implemented, I’d have to close and instantiate again the environment to apply the designated seed.

Any info about this @mohanty ?

Hi @Leckofunny,

So in the warm-up round and round-1, we will not be evaluating generalization. So during the training phase your code will have access to all the levels.

In round-2, we will be restricting access to only 200 levels during the training phase by enforcing num_levels=200 during all env instantiation.

Does this answer your question ?.

Cheers,
Mohanty

@mohanty
I’d like to explicitly set a distinct seed for each worker during training, because I’ve got a concept for sampling seeds.
The implementation would probably look similar to this:
https://docs.ray.io/en/master/rllib-training.html#curriculum-learning

As far as I know, the Procgen environment has to be closed and instantiated again to apply a distinct seed (num_leves = 1, start_level = my_desired_seed), because I cannot enforce a new seed during the reset() call.

So I assume that 200 seeds will be sampled uniformly and it will not be possible to inject my logic to alter the sampling strategy of the 200 seeds.

I guess my assumption is correct since nobody negates it.
It is a pity that Curriculum Learning cannot be done during this challenge.

Agreed. I wanted to try curriculum learning (and a few other ideas) but basically wrote them off as not possible since we can’t change any of the environment wrapper code. Would appreciate it if anyone found a workaround.

Hello @Leckofunny

This should be possible. We do support callbacks so approach 2 mentioned in definitely doable. For the first approach, did you try passing it as a custom algorithm? According to this line,

You should be able to pass the train function in the approach 1 as a custom algorithm. Can you give it a try? If that doesn’t work, you should be able to extend the existing PPOTrainer class as custom algorithm.

For the re-init part, can you try running env.close() followed by env.__init__() with parameters from the current env? I’m not sure if this is really the right way. I’ll get back in case I find a better solution.

Just to add a bit more discussion here. I thought a little bit more about curriculum learning and perhaps it’s a bit against the spirit of the competition. For round 1 it doesn’t matter, but when we get into the final round where num_levels=200, curriculum learning just seems like a way to skirt that rule by having more levels to work with. This would only work if you’re careful to only allot x numbers of levels for easy and 200-x levels for the hard distribution. Just something to keep in mind if anyone else wants to explore this idea.

1 Like