How are you enforcing the usage of 200 training seeds once submitted?
I’m planning on a submission that has some logics to sample certain seeds for each environment.
And as far as Procgen is implemented, I’d have to close and instantiate again the environment to apply the designated seed.
As far as I know, the Procgen environment has to be closed and instantiated again to apply a distinct seed (num_leves = 1, start_level = my_desired_seed), because I cannot enforce a new seed during the reset() call.
So I assume that 200 seeds will be sampled uniformly and it will not be possible to inject my logic to alter the sampling strategy of the 200 seeds.
Agreed. I wanted to try curriculum learning (and a few other ideas) but basically wrote them off as not possible since we can’t change any of the environment wrapper code. Would appreciate it if anyone found a workaround.
This should be possible. We do support callbacks so approach 2 mentioned in definitely doable. For the first approach, did you try passing it as a custom algorithm? According to this line,
You should be able to pass the train function in the approach 1 as a custom algorithm. Can you give it a try? If that doesn’t work, you should be able to extend the existing PPOTrainer class as custom algorithm.
For the re-init part, can you try running env.close() followed by env.__init__() with parameters from the current env? I’m not sure if this is really the right way. I’ll get back in case I find a better solution.
Just to add a bit more discussion here. I thought a little bit more about curriculum learning and perhaps it’s a bit against the spirit of the competition. For round 1 it doesn’t matter, but when we get into the final round where num_levels=200, curriculum learning just seems like a way to skirt that rule by having more levels to work with. This would only work if you’re careful to only allot x numbers of levels for easy and 200-x levels for the hard distribution. Just something to keep in mind if anyone else wants to explore this idea.
My auto curriculum algorithm just alters the way seeds are sampled to provide much more useful data to the agent and hence improve sample efficiency. Having more seeds than 100 or 200 doesn’t even help in my opinion.