Hi everyone,
Posting some warnings / errors I consistently have in my training logs so people can tell me if they’re experiencing the same crashes. (Errors from this log).
-
Dashboard crashes with
error while attempting to bind on address ('::1', 8265, 0, 0): cannot assign requested address
. I solved this by addingwebui_host='127.0.0.1'
inray_init
intrain.py
(cf. stackoverflow) on google colab, not sure i need to do the same for aicrowd submission (which would mean touching totrain.py
). -
ls: cannot access '/outputs/ray-results/procgen-ppo/*/'
this seems to be in how variables are set in run.sh. Don’t know why they would want to access ray-results early on. -
given NumPy array is not writeable
(solved by downgrading to torch 1.3.1 locally, but still unclear how to downgrade when submitting, cf. discussion) -
[Errno 2] No such file or directory: 'merged-videos/training.mp4'
: seems to be on aicrowd side, but maybe we need to change how we log videos? see this example or this PR. -
WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This may slow down performance! ... you may need to pass an argument with the flag '--shm-size' to 'docker run'.
: this seems to be on aicrowd server side, but maybe we need to change the docker file or do clever things? -
The process_trial operation took 1.1798417568206787 seconds to complete, which may be a performance bottleneck
: this is from just scaling the number of channels in impala baseline by 4x (so 16x the params). Have people been experiencing the same performance bottlenecks with other models?