Is it ok to directly wrap the gym env in `create_single_env` in local_evaluation.py when evaluating the model in aicrowd?

CH_do · August 5, 2022, 2:12am

If I need to process the obs and reset some data when env.reset() is called, is it ok to directly wrap the env with my own-defined wrapper in create_single_env function in local_evaluation.py?

dipam · August 8, 2022, 2:56am

Hi @CH_do

Unfortunately we do not support this. Any changes in local_evaluation are not used for the actual evaluation.

Please add all wrapper related logic to the agent class you’re submitting in agents/user_config.py

CH_do · August 8, 2022, 6:21am

Thanks for your reply.
Some feature processing methods like stacking the recent k frames need to clear some historical data when the env is reset. However, in the current evaluation framework, it is hard to do this. This problem can be solved if another parameter, such as a bool variable is_first_obs indicating whether the env is reset, is allowed to be passed into agent.act.

dipam · August 8, 2022, 6:34am

Hi @CH_do

The done parameter is the same as what the env outputs. You can use that to detect resets.

CH_do · August 8, 2022, 6:42am

The done is always reset to False when one episode ends in evaluate in local_evalutor.py.

dipam · August 8, 2022, 6:58am

Oh, this is a bug. Thanks for pointing this out, I’ll fix it asap.

dipam · August 8, 2022, 7:21am

@CH_do

done will be True after the env resets now. Thanks again for checking this.

CH_do · August 8, 2022, 7:38am

Hi, is it should be the observations (not observations_agent) here?

dipam · August 8, 2022, 7:59am

Indeed, it should be observation, fixed it. Thanks.

CH_do · August 9, 2022, 6:45am

Hi, sorry to bother you again.
What’s the difference between the local evaluator and the actual one? It seems that the performance evaluated in the actual has a huge drop than in local. (LIMIT_TASKS has been changed to None in LocalEvalConfig)

dipam · August 9, 2022, 7:10am

@CH_do

The score comes from a private set of tasks. Please check if you model is overfitting to the public tasks.