I’ve spent countless hours over the past 3 days trying to figure out why I could not upload/evaluate a new model nor reproduce the problem locally.
Basically, there is a breaking change whereby existing code will no longer run server-side - It would have saved me hours had there been a better error and/or someplace to look for notifications (I don’t think it helps to have 2 unity repros with issue trackers as well as the ai-crowd message board)
- June 13th, was my last successful upload of a model.
- On July 5th I tried to upload a new model - the only change to my code base the addition of the model and a reference to that model.
aicrowd-botposted the log: it just said this:
2019-07-06T07:13:55.1056876Z root 2019-07-06T07:13:55.129380334Z Traceback (most recent call last): 2019-07-06T07:13:55.12943787Z File "run_evaluation.py", line 7, in <module> 2019-07-06T07:13:55.129471922Z import gym 2019-07-06T07:13:55.129478407Z ModuleNotFoundError: No module named 'gym'
- I learned that we can now do debug submits: Announcement: Debug your submissions however, all it gave was the same log
- I tried to reproduce locally, however, the
build.shscript was giving me the error:
AttributeError: /srv/conda/bin/python: undefined symbol: archive_errno
- I thought maybe a conda or pip package update may have broken something so I manually tied each one to the valid version from June 13th
- I thought there may be some local issue with my docker, so I cleaned, deleted, reset
- I ran
pip install --upgrade aicrowd-repo2dockerand saw that it updated. This solved my local issue and was able to reproduce the server side error.
- Given that
aicrowd-repo2dockerhad been updated, i thought to look at the commit logs and found that this https://github.com/Unity-Technologies/obstacle-tower-challenge/commit/99c68faf2ed0f01ee8bc3e411bbdd4e85484a733 removed
source activate basefrom run.sh
Note: I still can not test locally - the agent code runs, but the environment docker immediately drops out. I also have to manually delete the docker image to force it to rebuilt (this was not the case prior to the