Submission Errors

mohanty · July 24, 2019, 10:06pm

@nilabha: Had posted earlier on the relevant issue. The problem was the run.sh does not have execution permissions, so a chmod +x run.sh (and a subsequent commit) should fix the problem.

nilabha · July 25, 2019, 2:33pm

@mohanty
run.sh it seems already has executable permission, hence there is no change in my local files on doing chmod +x run.sh
permissions are -rwxrwxrwx in my local
I have looked at the permissions for the other files and they are the same.
Do you think there is some other error?

Thanks,
Nilabha

mohanty · July 25, 2019, 3:37pm

I pulled down the image built from your submission, and the permissions for run.sh indeed do not have any execution permission :

root@408dfe6f4a7c:~# ls -al
total 112
drwxr-xr-x 1 aicrowd aicrowd 4096 Jul 24 16:54 .
drwxr-xr-x 1 root    root    4096 Jun 14 08:09 ..
-rw-rw-r-- 1 aicrowd aicrowd 3033 Jul 24 16:46 aicrowd_helpers.py
-rw-rw-r-- 1 aicrowd aicrowd  194 Jul 24 16:46 aicrowd.json
-rw-r--r-- 1 aicrowd aicrowd  220 Jun 14 08:09 .bash_logout
-rw-r--r-- 1 aicrowd aicrowd 3771 Jun 14 08:09 .bashrc
-rw-rw-r-- 1 aicrowd aicrowd  339 Jul 24 16:46 build.sh
drwx------ 3 aicrowd aicrowd 4096 Jul 24 16:54 .cache
drwxrwsr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .conda
drwx------ 3 aicrowd aicrowd 4096 Jun 19 08:45 .config
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 data
-rw-rw-r-- 1 aicrowd aicrowd  349 Jul 24 16:46 debug.sh
drwxr-xr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .empty
-rw-rw-r-- 1 aicrowd aicrowd 2123 Jul 24 16:46 environment.yml
-rw-rw-r-- 1 aicrowd aicrowd   94 Jul 24 16:46 environ.sh
-rw-rw-r-- 1 aicrowd aicrowd   94 Jul 24 16:46 .gitattributes
-rw-rw-r-- 1 aicrowd aicrowd   57 Jul 24 16:46 .gitignore
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 models
-rw-r--r-- 1 aicrowd aicrowd  807 Jun 14 08:09 .profile
-rw-rw-r-- 1 aicrowd aicrowd 8559 Jul 24 16:46 README.md
-rw-rw-r-- 1 aicrowd aicrowd 8461 Jul 24 16:46 run.py
-rw-rw-r-- 1 aicrowd aicrowd   28 Jul 24 16:46 run.sh
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 sample
-rw-rw-r-- 1 aicrowd aicrowd  132 Jul 24 16:46 trainnew.csv

mohanty · July 25, 2019, 3:39pm

@nilabha: I fixed that in this commit : https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/commit/c51854e038a5bb600f536bcafcd944735955def1

Best of luck

gloria_macia_munoz · July 25, 2019, 10:17pm

Hi @mohanty I am getting the following error in my submission:

RuntimeError: DataLoader worker (pid(s) 16) exited unexpectedly

(I read the following could fix the issue --shm-size 50G and runtime --runtime=nvidia could we try? do not know where to add it for the container config)

nilabha · July 26, 2019, 2:12am

@mohanty
I have redone with a fresh copy from repo and updating the files.
Sorry for asking you again but I still get the error in my latest submission http://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/22
Do you know what is the error?

Thanks,
Nilabha

nilabha · July 26, 2019, 4:47am

I am feeling guilty having to ask you everytime
I wouldn’t mind if all my logs are made public to be honest.

But on another note would the deadlines of this competition be extended. Due to less time left in the competition, I haven’t done much to replicate this submission process locally (I think there are some instructions in the link https://github.com/stanfordnmbl/neurips2019-learning-to-move-starter-kit). As of now I just do bash run.sh to test that the code is working in my local.

mohanty · July 26, 2019, 9:07am

Looks like this is because the Dataloader is trying to spawn way too many workers. Can you set num_workers=0 in the DataLoader so that it does all the data loading in the main thread.

nilabha · July 26, 2019, 5:20pm

@mohanty
I have set num_workers=0
I still get the error.
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/24
Could you share the error logs?

Thanks,
Nilabha

ashivani · July 26, 2019, 8:52pm

@nilabha Commented on the issue. Hope this helps!

nilabha · July 27, 2019, 8:06am

@mohanty
I have been using trial and error to see which line of code the submission is failing, unfortunately I have crossed the limit of submission
Could you provide the error logs to debug the issue?
I have isolated the issue to few lines of code but it would be helpful to know what is the error.

Thanks,
Nilabha

nilabha · July 29, 2019, 10:44am

@mohanty @arjun_nemani

Can you please help with the error logs. I keep getting an evaluation failure.
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/31

I have tried reducing the batch size also in case of any memory issue. However the error still seems to be there.
From trial and error I notice that the error happens in the code line learn.tta function from the fast ai library.

Thanks,
Nilabha

mohanty · July 29, 2019, 10:51am

@nilabha: Some images in the test set are corrupt unfortunately. And we have not removed them to stay consistent. But if you cannot read an image with PIL, then please use a random prediction for the same.

gokuleloop · July 29, 2019, 11:24am

You can use the following way to solve this issue:
verify_images(test_images_path,delete=True) # test_images_path in run.py
This will fetch you 17686 files
You will have to have some logic to generate predicitions for corrupted files
I do it this way (not the best way):

Use the sample submission file and replace the random given probabilities with predicted probabilities

Do test it locally. TTA takes a long time to complete (~7hrs)

nilabha · July 30, 2019, 4:33am

@mohanty
I have put a workaround to find the images which fail loading and delete them and later add these probability which are all equal to 1/45
However I still get an error
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/32

Can you please help with the error?

Thanks,
Nilabha

mohanty · July 30, 2019, 4:42am

The error is :

Traceback (most recent call last):
  File "run.py", line 258, in <module>
    run()
  File "run.py", line 146, in run
    os.remove(sRemove)
OSError: [Errno 30] Read-only file system: '/test_data/2e23ade63c4e32b728a423ff19e52ef1.jpg'

Your code is trying to remove images from the test set here

Instead of trying to delete the corrupt files, please try to just append a random prediction to the final prediction CSV file.

gokuleloop · July 30, 2019, 7:38am

@mohanty Evaluator is not picking up any of my submissions. Please look into it. Latest commit : https://gitlab.aicrowd.com/gokuleloop/snake-breed-identification/commit/c2d0dd9f55cd89796de929328b9a6e746941e774

mohanty · July 30, 2019, 7:54am

As mentioned in the starter kit, the tag names have to begin with “submission-“ , so pushing a tag called as say : submission-v010
should do the trick.

This was a loosely held rule and has only been enforced recently.

kristjan_kongas · July 30, 2019, 10:26am

You have only read access for the initial test images directory, so you can’t remove files from there. However, you can copy the whole test images directory to a local directory and run verify images there.

os.system("cp -r {0} test/".format(test_images_path))
verify_images("test", delete=True)

nilabha · July 31, 2019, 2:10pm

Thanks kongas
I used the below code to remove the images

filter_func = lambda x: str(x) not in lsRemove
test_img = (ImageList.from_folder(path).filter_by_func(filter_func))

Though there is another error…
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/33

@mohanty
Has the competition ended or will it restart again. Would have liked to get a score as my validation results were good.
Is it possible to get the error logs?