Submissions Q&A

Before your Submissions Question:

This is debugged with my 35 submission experience.
I am also a contestant, I am not an official staff member.
In order to compete fairly, I hope everyone can break through 100 floors.

Due to some inherent instability in Unity Env, it is a bit special.
Thanks for @mohanty help

This assumes that you are already able to run tests locally.
Check these step:

Update your env to least version.

Q:“mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond”
A:This is open issue.
2 steps
Change timeout > 300s

env = ObstacleTowerEnv(args.environment_filename, docker_training=args.docker_training, retro=True, timeout_wait=600)

It is recommended to perform the import of agent tensorflow after env.is_grading() before the official release of the new environment file.

if env.is_grading():
    import tensorflow as tf
    import youragent
    while True:.......

Q: bash >>> ^M
A:
The newline symbol for the Dos file is ^J^M
The newline symbol for Unix files is ^J
So in some cases you will encounter bash problems, because it is the relationship of the Windows environment configuration file, you should use Vim and other similar editors to change the encoding.

Q: ckpt import error (ex gzip broken)
A:Follow this

2 Likes

Excuse me, bro. There are some questions that need your help:

  • How can I create a gitlab repo to submit my code, it seems that their official gitlab website can’t get access
  • If I can make a submission once after I follow their “training the agent on gcp” tutorial only and run the code successfully?(except the error that: The Unity environment took too long to respond....)

You should read all post in this discourse. So much people use dopamine.
If you have problem in submission, you should check again your code and read post again.
Now, all dopamine user are successfully submitted.


Read again~

Hi @ChenKuanSun,

I have upload my repo to gitlab.aicrowd.com according to the instructions provided in Unity-Technologies/obstacle-tower-challenge.

I went to issues to check and found nothing, been waited for an hour, not sure how long it takes to verify or receive my submission. Or may be i did some setting wrongly ?

2 Likes

I have the same issue

1 Like

maybe docker dead~~~~

It seems evaluation server dead.
Mine stuck for 7 hours

1 Like

opened 7 hours ago image_build_job_enqueued
…and no further progress

Acknowledging the issue ! And will post an update soon !

Update : The evaluator is working as expected again now. Some of the key servers responsible for orchestrating the evaluations, had unfortunately froze. We are suspecting it was a weird/rare AWS tanrtum. But we have manually restarted the affected nodes now.

Can you guys make the change to your evaluator to delay the start of env by 20 seconds? like in env.sh


if [ -z "$2" ]
  then
    ENV_FILENAME="./ObstacleTower/obstacletower.x86_64"
fi

wait 20

touch otc_out.json
xvfb-run --auto-servernum --server-args='-screen 0 640x480x24' $ENV_FILENAME --port $ENV_PORT > /dev/null 2&>1 &

Cuz in my case, I have to load the model first before init the env. Thanks.

@arthurj @harperj @anhad : Are you guys okay with a 20second wait for every evaluation ?

I have mixed feelings. As these synchronizations are best dealt with in a single place. The timeout_wait param in the env is where we do it now. And I would want to continue sticking to it, and maybe just set a more sensible default value.

@mohanty

Right now it seems that people are running into two main issues. The first is that the environment starts too late after the agent is initialized. This is what the timeout_wait fixes. The second issue some seem to have is that the environment starts too early and throws and error before the agent event attempts to connect. I believe that latter issue is what the 20 second environment wait is designed to address.

@arthurj
The core issue is that if any agent is importing any large libraries (rainbow, tensorflow, dopamine, etc) before the agent can load the environment the test will never be started no matter how long the timeout_wait parameter is.

The easiest way would be to add a delay to the environment container starting up to give time for the agent to load, without referring to hacky methods such as deferring imports.

In local docker testing the agent container must always be ready before the environment container starts as well.

Hi @mohanty,
I have upload and evaluation successful when I use debug mode,
however it failed when I use the non-debug mode(debug == False)
The error is: https://gitlab.aicrowd.com/STAR.Lab/obstacle-tower-challenge/issues/7
I can’t solve the problem, please give me some suggestion or let me know the error in non-debug mode.
Thanks a lot.

@STAR.Lab: This is the same Timeout exception again. I will add a 20second wait to the environment instantiation now. And hopefully that fixes problems.
The updated 20second wait should be available on the evaluators in about 2-3 mins from now.

Edit : Also, another approach could be to ensure that you instantiate the env the first thing, and then go onto loading any of the heavy models, libraries, etc.

Edit : Also, another approach could be to ensure that you instantiate the env the first thing, and then go onto loading any of the heavy models, libraries, etc

Thanks for the update, my agent already creates the environment first thing, so should be nothing ahead of that aside from importing modules, seeing if CUDA is present, and seeding the RNG. Hopefully the 20 sec delay does the trick though.

Edit: For the record, in my eval run script, the time to launch the python script (timestamp in shell passed into script) to the call to create the Obstacle Tower environment is 0.33 seconds on my hardware, GPU makes no difference here. The creation of the ObstacleTowerEnv is just over 3 seconds. So from launch to having an instantiated environment is on average just under 3.5 seconds.