Submission Errors

@mohanty @shivam @arjun_nemani
I can’t quite find any logs available for my failed submission.
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/3

Request you to kindly share the logs

I see many submissions of other participants also failing. In case of an issue in submission do we just message the organisers to get the logs or is there any other way.

@nilabha: can you please pull in the latest changes from the main starter kit

Also, if you are using environment.yaml for packaging your software runtime, please delete the Dockerfile at the root of your repo. That is the main cause of your submission failing at the moment. Sorry for the confusion.

Hi Mohanty

Thanks for the reply. I submitted some new submissions after committing but somehow they are not getting submitted for evaluation.
For example:
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/commits/submission-v3.2

Can you let me know what is the problem?

Thanks,
Nilabha

Thanks,
Nilabha

@nilabha: We had a small outage yesterday, and some of the evaluations were affected including yours. The same has been resolved now.

Your submission was re queued, and there was still an error in the image build, where certain packages were not found on Conda for linux :

ResolvePackageNotFound: 
  - m2w64-gcc-libs-core=5.3.0
  - m2w64-gcc-libgfortran=5.3.0
  - win_inet_pton=1.1.0
  - m2w64-gcc-libs=5.3.0
  - vs2015_runtime=14.15.26706
  - m2w64-gmp=6.1.0
  - winpty=0.4.3
  - msys2-conda-epoch=20160418
  - icc_rt=2019.0.0
  - m2w64-libwinpthread-git=5.0.0.4634.697f757
  - vc=14.1
  - pywinpty=0.5.5
  - pyreadline=2.1
  - wincertstore=0.2

@mohanty

I have uploaded another version of environment.yml file but the image build fails.

Can you please provide me with the error logs?

https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/10

Thanks,
Nilabha

The build logs are here : https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/snippets/7288

From the look of it, you can solve it by adding a apt.txt at the root of your repository with gcc in it.

@mohanty

I am trying to submit but I am getting below error
Submission failed : Malformed JSON provided in aicrowd.json
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/15

I don’t see any issues with this file.

Do you know what is the problem?

Thanks,
Nilabha

Looking at the logs, it seems that your aicrowd.json is checked into LFS, while the evaluator expects it to be checked in directly in the repository (this is a bug, and should be fixed at our end, and we will), but in the meantime, if you move your aicrowd.json from LFS to a direct check in into the git repository, the evaluation should go through !

Thanks. Its working now. The build was also failing due to the environmental.yaml checked using LFS.
Unfortunately the evaluation is failing now after a successful build. Can you kindly help me with the error logs?
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/18

@nilabha: Had posted earlier on the relevant issue. The problem was the run.sh does not have execution permissions, so a chmod +x run.sh (and a subsequent commit) should fix the problem.

@mohanty
run.sh it seems already has executable permission, hence there is no change in my local files on doing chmod +x run.sh
permissions are -rwxrwxrwx in my local
I have looked at the permissions for the other files and they are the same.
Do you think there is some other error?

Thanks,
Nilabha

I pulled down the image built from your submission, and the permissions for run.sh indeed do not have any execution permission :

root@408dfe6f4a7c:~# ls -al
total 112
drwxr-xr-x 1 aicrowd aicrowd 4096 Jul 24 16:54 .
drwxr-xr-x 1 root    root    4096 Jun 14 08:09 ..
-rw-rw-r-- 1 aicrowd aicrowd 3033 Jul 24 16:46 aicrowd_helpers.py
-rw-rw-r-- 1 aicrowd aicrowd  194 Jul 24 16:46 aicrowd.json
-rw-r--r-- 1 aicrowd aicrowd  220 Jun 14 08:09 .bash_logout
-rw-r--r-- 1 aicrowd aicrowd 3771 Jun 14 08:09 .bashrc
-rw-rw-r-- 1 aicrowd aicrowd  339 Jul 24 16:46 build.sh
drwx------ 3 aicrowd aicrowd 4096 Jul 24 16:54 .cache
drwxrwsr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .conda
drwx------ 3 aicrowd aicrowd 4096 Jun 19 08:45 .config
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 data
-rw-rw-r-- 1 aicrowd aicrowd  349 Jul 24 16:46 debug.sh
drwxr-xr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .empty
-rw-rw-r-- 1 aicrowd aicrowd 2123 Jul 24 16:46 environment.yml
-rw-rw-r-- 1 aicrowd aicrowd   94 Jul 24 16:46 environ.sh
-rw-rw-r-- 1 aicrowd aicrowd   94 Jul 24 16:46 .gitattributes
-rw-rw-r-- 1 aicrowd aicrowd   57 Jul 24 16:46 .gitignore
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 models
-rw-r--r-- 1 aicrowd aicrowd  807 Jun 14 08:09 .profile
-rw-rw-r-- 1 aicrowd aicrowd 8559 Jul 24 16:46 README.md
-rw-rw-r-- 1 aicrowd aicrowd 8461 Jul 24 16:46 run.py
-rw-rw-r-- 1 aicrowd aicrowd   28 Jul 24 16:46 run.sh
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 sample
-rw-rw-r-- 1 aicrowd aicrowd  132 Jul 24 16:46 trainnew.csv

@nilabha: I fixed that in this commit : https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/commit/c51854e038a5bb600f536bcafcd944735955def1

Best of luck

Hi @mohanty I am getting the following error in my submission:

RuntimeError: DataLoader worker (pid(s) 16) exited unexpectedly

(I read the following could fix the issue --shm-size 50G and runtime --runtime=nvidia could we try? do not know where to add it for the container config)

@mohanty
I have redone with a fresh copy from repo and updating the files.
Sorry for asking you again but I still get the error in my latest submission http://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/22
Do you know what is the error?

Thanks,
Nilabha

I am feeling guilty having to ask you everytime :stuck_out_tongue:
I wouldn’t mind if all my logs are made public to be honest.

But on another note would the deadlines of this competition be extended. Due to less time left in the competition, I haven’t done much to replicate this submission process locally (I think there are some instructions in the link https://github.com/stanfordnmbl/neurips2019-learning-to-move-starter-kit). As of now I just do bash run.sh to test that the code is working in my local.

Looks like this is because the Dataloader is trying to spawn way too many workers. Can you set num_workers=0 in the DataLoader so that it does all the data loading in the main thread.

@mohanty
I have set num_workers=0
I still get the error.
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/24
Could you share the error logs?

Thanks,
Nilabha

@nilabha Commented on the issue. Hope this helps!

@mohanty
I have been using trial and error to see which line of code the submission is failing, unfortunately I have crossed the limit of submission
Could you provide the error logs to debug the issue?
I have isolated the issue to few lines of code but it would be helpful to know what is the error.

Thanks,
Nilabha