Here is the error that I get from a Python Process that I spawn to handle one instance of the ObstacleTowerEnv, among many others:
E0712 18:36:48.452364988 30043 ev_epoll1_linux.cc:1061] assertion failed: next_worker->initialized_cv
I am training by harvesting multiple instances of ObstacleTowerEnv with multiple processes, each environment being spawned with different worker_id (following a previous discussion: Running multiple instances).
Nevertheless, the issue occurs independantly of the number of environment/process spawned.
I have traced it back to gRPC, that is used in the client-server communication of each ObstacleTowerEnv instances.
Since it would terminate my harvesting processes with a SIGABRT, I meant to simply terminate the process, close the environment instance, and then restart a new process and a new environment instance --with another worker_id-- but it seems that there is something I am still not grasping.
Since I cannot skirt the problem, I rely on your good advice to guide me in some better directions please!
I am training with PyTorch and using the following packages, on Python 3.6.8 and Ubuntu 16.04.6 LTS (Xenial Xerus) (reproduced the error on Ubuntu 18.04.2 LTS (Bionic Beaver)) :
absl-py==0.7.1 astor==0.8.0 atari-py==0.2.3 atomicwrites==1.3.0 attrs==19.1.0 backcall==0.1.0 cloudpickle==1.2.1 cycler==0.10.0 decorator==4.4.0 dill==0.3.0 docopt==0.6.2 future==0.17.1 gast==0.2.2 google-pasta==0.1.7 grpcio==1.11.1 gym==0.13.1 gym-rock-paper-scissors==0.1 h5py==2.9.0 importlib-metadata==0.18 ipdb==0.12 ipython==7.6.1 ipython-genutils==0.2.0 jedi==0.14.0 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 kiwisolver==1.1.0 Markdown==3.1.1 matplotlib==3.1.1 mlagents-envs==0.6.2 more-itertools==7.1.0 numpy==1.16.1 -e git+https://github.com/Unity-Technologies/obstacle-tower-env/@474fbf00564ae1357373b1e2d72dcb9af095540b#egg=obstacle_tower_env opencv-python==18.104.22.168 packaging==19.0 pandas==0.24.2 parso==0.5.0 pexpect==4.7.0 pickleshare==0.7.5 Pillow==5.4.1 pluggy==0.12.0 prompt-toolkit==2.0.9 protobuf==3.6.1 ptyprocess==0.6.0 py==1.8.0 pyglet==1.3.2 Pygments==2.4.2 PyOpenGL==3.1.0 pyparsing==2.4.0 pytest==3.10.1 python-dateutil==2.8.0 pytz==2019.1 PyYAML==5.1.1 -e git+https://github.com/Danielhp95/Generalized-RL-Self-Play-Framework/@bd872b3b547a008fe126a3584b83448157f5ee3d#egg=regym scipy==1.3.0 seaborn==0.9.0 six==1.12.0 tensorboard==1.12.0 tensorboardX==1.8 tensorflow==1.12.0 tensorflow-estimator==1.14.0 termcolor==1.1.0 torch==1.1.0 torchvision==0.3.0 tqdm==4.32.2 traitlets==4.3.2 wcwidth==0.1.7 Werkzeug==0.15.4 wrapt==1.11.2 zipp==0.5.2
I am realizing that it might be important to mention the following, with regards to the spawning of the processes and the creation of the environment instances: I call the ObstacleTowerEnv() function in the main process (many times), and then pass each instance as argument to a new process that communicates with the main process via Queues.
If I create the environment inside the spawned process, I would end up with UnityTimedOutException…
I am using PyTorch, which implements its own flavours of multiprocessing, that I am using as well. At some point, I assumed that it was colliding with ObstacleTowerEnv’s own multiprocessing needs but my inquires were not fruitful…