Problems of using rllib for a research competition

The following might be somewhat of a rant, my point of view is from a research point of view, I understand there may be other considerations to make.

The rllib constraint of the competition seems to be really problematic from a research point of view. The framework has a very steep learning curve and lots of constraints for memory management and algorithmic flow, it’s great for large scale training, but unnecessary overhead for single machine training.

I’m sure many others must have faced the following situation. You think of an idea and try it in the framework you know well, for example, baselines or dopamine, then you spend much more time hacking it into rllib only to hit unnecessary memory management issues, even though the same method works on baselines with same constraints, this really hinders the research time. In my opinion, research like this should have more flexibility.

I understand no solution is perfect, but it would be better to be given an environment with a wrapper that limits the timesteps, and a simple API and model filename to call for the final evaluation. The reasoning mentioned for rllib by the organizers is easy future integration of the top solutions, but I doubt that you could join RL algorithms so easily, for example, an off-policy method with an on-policy method. Given this is a NeuRIPs competition on a benchmark on which very little research has been done, I would imagine making the research easier would be an important aspect.

So I request the organizers to consider relaxing the rllib constraint.


Yes, I have faced the same situation. It really takes time to rewrite our code into rllib framwork, and takes much more time to debug . It seems that this question have been discussed before: FAQ: Regarding rllib based approach for submissions

1 Like

I’ve read through FAQ: Implementing a custom random agent and use most concepts from it. I’ll still say rllib makes development quite slow from a research perspective. I still don’t see the value of using a distributed framework and all its overhead for a single machine training system. I understand a lot development must have gone from the organizers side in terms of providing logs or tracking metrics etc using rllib, but its a big burden from a research perspective.


I decided to not participate in this competition as well, due to the aforementioned constraints. I’ll continue using Procgen though using my established workflow and code.

Right now I’ve planned to test everything on my own machine and then move write the final code on rllib after refining it. But of course it will be much harder to check properly on 16 environments in the final round. I only have a single machine, For anyone without a local machine they would be at a even higher disadvantage. Maybe one strategy is to team up with someone who is already an rllib expert.