@mohanty @shivam @arjun_nemani
I can’t quite find any logs available for my failed submission.
Request you to kindly share the logs
I see many submissions of other participants also failing. In case of an issue in submission do we just message the organisers to get the logs or is there any other way.
@nilabha: can you please pull in the latest changes from the main starter kit
Also, if you are using
environment.yaml for packaging your software runtime, please delete the Dockerfile at the root of your repo. That is the main cause of your submission failing at the moment. Sorry for the confusion.
Thanks for the reply. I submitted some new submissions after committing but somehow they are not getting submitted for evaluation.
Can you let me know what is the problem?
@nilabha: We had a small outage yesterday, and some of the evaluations were affected including yours. The same has been resolved now.
Your submission was re queued, and there was still an error in the image build, where certain packages were not found on Conda for linux :
I have uploaded another version of environment.yml file but the image build fails.
Can you please provide me with the error logs?
The build logs are here : https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/snippets/7288
From the look of it, you can solve it by adding a
apt.txt at the root of your repository with
gcc in it.
I am trying to submit but I am getting below error
Submission failed : Malformed JSON provided in aicrowd.json
I don’t see any issues with this file.
Do you know what is the problem?
Looking at the logs, it seems that your
aicrowd.json is checked into LFS, while the evaluator expects it to be checked in directly in the repository (this is a bug, and should be fixed at our end, and we will), but in the meantime, if you move your aicrowd.json from LFS to a direct check in into the git repository, the evaluation should go through !
Thanks. Its working now. The build was also failing due to the environmental.yaml checked using LFS.
Unfortunately the evaluation is failing now after a successful build. Can you kindly help me with the error logs?
@nilabha: Had posted earlier on the relevant issue. The problem was the
run.sh does not have execution permissions, so a
chmod +x run.sh (and a subsequent commit) should fix the problem.
run.sh it seems already has executable permission, hence there is no change in my local files on doing chmod +x run.sh
permissions are -rwxrwxrwx in my local
I have looked at the permissions for the other files and they are the same.
Do you think there is some other error?
I pulled down the image built from your submission, and the permissions for
run.sh indeed do not have any execution permission :
root@408dfe6f4a7c:~# ls -al
drwxr-xr-x 1 aicrowd aicrowd 4096 Jul 24 16:54 .
drwxr-xr-x 1 root root 4096 Jun 14 08:09 ..
-rw-rw-r-- 1 aicrowd aicrowd 3033 Jul 24 16:46 aicrowd_helpers.py
-rw-rw-r-- 1 aicrowd aicrowd 194 Jul 24 16:46 aicrowd.json
-rw-r--r-- 1 aicrowd aicrowd 220 Jun 14 08:09 .bash_logout
-rw-r--r-- 1 aicrowd aicrowd 3771 Jun 14 08:09 .bashrc
-rw-rw-r-- 1 aicrowd aicrowd 339 Jul 24 16:46 build.sh
drwx------ 3 aicrowd aicrowd 4096 Jul 24 16:54 .cache
drwxrwsr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .conda
drwx------ 3 aicrowd aicrowd 4096 Jun 19 08:45 .config
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 data
-rw-rw-r-- 1 aicrowd aicrowd 349 Jul 24 16:46 debug.sh
drwxr-xr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .empty
-rw-rw-r-- 1 aicrowd aicrowd 2123 Jul 24 16:46 environment.yml
-rw-rw-r-- 1 aicrowd aicrowd 94 Jul 24 16:46 environ.sh
-rw-rw-r-- 1 aicrowd aicrowd 94 Jul 24 16:46 .gitattributes
-rw-rw-r-- 1 aicrowd aicrowd 57 Jul 24 16:46 .gitignore
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 models
-rw-r--r-- 1 aicrowd aicrowd 807 Jun 14 08:09 .profile
-rw-rw-r-- 1 aicrowd aicrowd 8559 Jul 24 16:46 README.md
-rw-rw-r-- 1 aicrowd aicrowd 8461 Jul 24 16:46 run.py
-rw-rw-r-- 1 aicrowd aicrowd 28 Jul 24 16:46 run.sh
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 sample
-rw-rw-r-- 1 aicrowd aicrowd 132 Jul 24 16:46 trainnew.csv
Hi @mohanty I am getting the following error in my submission:
RuntimeError: DataLoader worker (pid(s) 16) exited unexpectedly
(I read the following could fix the issue --shm-size 50G and runtime --runtime=nvidia could we try? do not know where to add it for the container config)
I have redone with a fresh copy from repo and updating the files.
Sorry for asking you again but I still get the error in my latest submission http://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/22
Do you know what is the error?
I am feeling guilty having to ask you everytime
I wouldn’t mind if all my logs are made public to be honest.
But on another note would the deadlines of this competition be extended. Due to less time left in the competition, I haven’t done much to replicate this submission process locally (I think there are some instructions in the link https://github.com/stanfordnmbl/neurips2019-learning-to-move-starter-kit). As of now I just do bash run.sh to test that the code is working in my local.
Looks like this is because the Dataloader is trying to spawn way too many workers. Can you set
num_workers=0 in the DataLoader so that it does all the data loading in the main thread.
I have set num_workers=0
I still get the error.
Could you share the error logs?
@nilabha Commented on the issue. Hope this helps!
I have been using trial and error to see which line of code the submission is failing, unfortunately I have crossed the limit of submission
Could you provide the error logs to debug the issue?
I have isolated the issue to few lines of code but it would be helpful to know what is the error.