@nilabha: Had posted earlier on the relevant issue. The problem was the run.sh
does not have execution permissions, so a chmod +x run.sh
(and a subsequent commit) should fix the problem.
@mohanty
run.sh it seems already has executable permission, hence there is no change in my local files on doing chmod +x run.sh
permissions are -rwxrwxrwx in my local
I have looked at the permissions for the other files and they are the same.
Do you think there is some other error?
Thanks,
Nilabha
I pulled down the image built from your submission, and the permissions for run.sh
indeed do not have any execution permission :
root@408dfe6f4a7c:~# ls -al
total 112
drwxr-xr-x 1 aicrowd aicrowd 4096 Jul 24 16:54 .
drwxr-xr-x 1 root root 4096 Jun 14 08:09 ..
-rw-rw-r-- 1 aicrowd aicrowd 3033 Jul 24 16:46 aicrowd_helpers.py
-rw-rw-r-- 1 aicrowd aicrowd 194 Jul 24 16:46 aicrowd.json
-rw-r--r-- 1 aicrowd aicrowd 220 Jun 14 08:09 .bash_logout
-rw-r--r-- 1 aicrowd aicrowd 3771 Jun 14 08:09 .bashrc
-rw-rw-r-- 1 aicrowd aicrowd 339 Jul 24 16:46 build.sh
drwx------ 3 aicrowd aicrowd 4096 Jul 24 16:54 .cache
drwxrwsr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .conda
drwx------ 3 aicrowd aicrowd 4096 Jun 19 08:45 .config
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 data
-rw-rw-r-- 1 aicrowd aicrowd 349 Jul 24 16:46 debug.sh
drwxr-xr-x 2 aicrowd aicrowd 4096 Jul 24 16:53 .empty
-rw-rw-r-- 1 aicrowd aicrowd 2123 Jul 24 16:46 environment.yml
-rw-rw-r-- 1 aicrowd aicrowd 94 Jul 24 16:46 environ.sh
-rw-rw-r-- 1 aicrowd aicrowd 94 Jul 24 16:46 .gitattributes
-rw-rw-r-- 1 aicrowd aicrowd 57 Jul 24 16:46 .gitignore
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 models
-rw-r--r-- 1 aicrowd aicrowd 807 Jun 14 08:09 .profile
-rw-rw-r-- 1 aicrowd aicrowd 8559 Jul 24 16:46 README.md
-rw-rw-r-- 1 aicrowd aicrowd 8461 Jul 24 16:46 run.py
-rw-rw-r-- 1 aicrowd aicrowd 28 Jul 24 16:46 run.sh
drwxrwxr-x 1 aicrowd aicrowd 4096 Jul 24 16:46 sample
-rw-rw-r-- 1 aicrowd aicrowd 132 Jul 24 16:46 trainnew.csv
@nilabha: I fixed that in this commit : https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/commit/c51854e038a5bb600f536bcafcd944735955def1
Best of luck
Hi @mohanty I am getting the following error in my submission:
RuntimeError: DataLoader worker (pid(s) 16) exited unexpectedly
(I read the following could fix the issue --shm-size 50G and runtime --runtime=nvidia could we try? do not know where to add it for the container config)
@mohanty
I have redone with a fresh copy from repo and updating the files.
Sorry for asking you again but I still get the error in my latest submission http://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/22
Do you know what is the error?
Thanks,
Nilabha
I am feeling guilty having to ask you everytime
I wouldn’t mind if all my logs are made public to be honest.
But on another note would the deadlines of this competition be extended. Due to less time left in the competition, I haven’t done much to replicate this submission process locally (I think there are some instructions in the link https://github.com/stanfordnmbl/neurips2019-learning-to-move-starter-kit). As of now I just do bash run.sh to test that the code is working in my local.
Looks like this is because the Dataloader is trying to spawn way too many workers. Can you set num_workers=0
in the DataLoader so that it does all the data loading in the main thread.
@mohanty
I have set num_workers=0
I still get the error.
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/24
Could you share the error logs?
Thanks,
Nilabha
@mohanty
I have been using trial and error to see which line of code the submission is failing, unfortunately I have crossed the limit of submission
Could you provide the error logs to debug the issue?
I have isolated the issue to few lines of code but it would be helpful to know what is the error.
Thanks,
Nilabha
Can you please help with the error logs. I keep getting an evaluation failure.
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/31
I have tried reducing the batch size also in case of any memory issue. However the error still seems to be there.
From trial and error I notice that the error happens in the code line learn.tta function from the fast ai library.
Thanks,
Nilabha
@nilabha: Some images in the test set are corrupt unfortunately. And we have not removed them to stay consistent. But if you cannot read an image with PIL, then please use a random prediction for the same.
You can use the following way to solve this issue:
verify_images(test_images_path,delete=True) # test_images_path in run.py
This will fetch you 17686 files
You will have to have some logic to generate predicitions for corrupted files
I do it this way (not the best way):
- Use the sample submission file and replace the random given probabilities with predicted probabilities
Do test it locally. TTA takes a long time to complete (~7hrs)
@mohanty
I have put a workaround to find the images which fail loading and delete them and later add these probability which are all equal to 1/45
However I still get an error
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/32
Can you please help with the error?
Thanks,
Nilabha
The error is :
Traceback (most recent call last):
File "run.py", line 258, in <module>
run()
File "run.py", line 146, in run
os.remove(sRemove)
OSError: [Errno 30] Read-only file system: '/test_data/2e23ade63c4e32b728a423ff19e52ef1.jpg'
Your code is trying to remove images from the test set here
Instead of trying to delete the corrupt files, please try to just append a random prediction to the final prediction CSV file.
@mohanty Evaluator is not picking up any of my submissions. Please look into it. Latest commit : https://gitlab.aicrowd.com/gokuleloop/snake-breed-identification/commit/c2d0dd9f55cd89796de929328b9a6e746941e774
As mentioned in the starter kit, the tag names have to begin with “submission-“ , so pushing a tag called as say : submission-v010
should do the trick.
This was a loosely held rule and has only been enforced recently.
You have only read access for the initial test images directory, so you can’t remove files from there. However, you can copy the whole test images directory to a local directory and run verify images there.
os.system("cp -r {0} test/".format(test_images_path))
verify_images("test", delete=True)
Thanks kongas
I used the below code to remove the images
filter_func = lambda x: str(x) not in lsRemove
test_img = (ImageList.from_folder(path).filter_by_func(filter_func))
Though there is another error…
https://gitlab.aicrowd.com/nilabha/snake-species-identification-challenge/issues/33
@mohanty
Has the competition ended or will it restart again. Would have liked to get a score as my validation results were good.
Is it possible to get the error logs?