The following might be somewhat of a rant, my point of view is from a research point of view, I understand there may be other considerations to make.
The rllib constraint of the competition seems to be really problematic from a research point of view. The framework has a very steep learning curve and lots of constraints for memory management and algorithmic flow, it’s great for large scale training, but unnecessary overhead for single machine training.
I’m sure many others must have faced the following situation. You think of an idea and try it in the framework you know well, for example, baselines or dopamine, then you spend much more time hacking it into rllib only to hit unnecessary memory management issues, even though the same method works on baselines with same constraints, this really hinders the research time. In my opinion, research like this should have more flexibility.
I understand no solution is perfect, but it would be better to be given an environment with a wrapper that limits the timesteps, and a simple API and model filename to call for the final evaluation. The reasoning mentioned for rllib by the organizers is easy future integration of the top solutions, but I doubt that you could join RL algorithms so easily, for example, an off-policy method with an on-policy method. Given this is a NeuRIPs competition on a benchmark on which very little research has been done, I would imagine making the research easier would be an important aspect.
So I request the organizers to consider relaxing the rllib constraint.