Our sincere apologies for the inconveniences faced by you.
Regarding the slow evaluation speeds, given that we have to execute your code (and models etc) on a large number of test images, the evaluations are indeed slow. Your model has to make predictions for a large number of images. We are trying to improve this experience by providing better feedback in terms of progress etc, and will definitely address this in the upcoming version of the challenges.
Regarding the competition, we are providing all updates on this forum here, and we would be happy to answer any and all questions you have here. We are also working on better notification systems so that you get relevant updates from the challenge over emails and other notification channels on the platform that you subscribe to.
In the meantime, we really appreciate your feedback. Your feedback helps us make the platform much better for thousands of other users on the platform, and under no circumstances we take it as an offense.
Thank You,
Mohanty
(on behalf of the organizing team)
@amapic: If you built the conda env drom the initial environment.yml file, then conda env export --no-build will export the updated state of the environment.
@amapic This is happening as these packages are only available for linux distribution, due to while installing them in windows (I assume you are using windows) is failing. This is unfortunately a limitation currently with conda.
In such scenario, I will recommend getting rid of above packages from environment.yaml and continue your conda env creation. These packages are often included being dependencies of “main” dependencies, conda should resolve similar package for your system automatically.
@devops@shivam what does the timeout mean? Anyone knows where I can find this information. I have asked this question numerous times after @devops commented my failed subs but they were ignored so I am bring it up here.
How am I supposed to debug Timeout? Some of my successful subs took longer to execute than most of those which failed because of timeout. I couldn’t come up with reasonable explanation for such behaviour. I hope you can help me to understand this.
The submissions ideally should take few hours to run but we have put hard timeout as 8 hours. In case your solution is crossings 8 hours it is marked failed.
According to you how much time your code should run roughly? Is it way too off in local v/s during evaluation phase?
Otherwise you can include GPU (if not doing right now) to speed up computation and finish the evaluation under 8 hours.
Please let us know in case you require more help with debugging your submission. We can try to see which step/part of code is taking higher time if required.
I don’t manage to sub and I don’t have time left for this competition for the moment. Can you let the evaluation working after tthe 17 ? I would like to add a line on my resume about this competition.
How come some of my subs took 14h and didn’t fail if the limit is 8h? Then again, how am I supposed to know that timeout is set to 8h? Where is it written? I also thought for a moment that you keep changing the timeout limit? Can you confirm that this is not true?
inferencing time is way off. Locally my model on 1080ti takes ~10 minutes to execute so obviously it runs on CPU when submitted.
@ValAn No, I can confirm the timeouts haven’t been change b/w your previous and current runs. The only issue has been timeout wasn’t implemented properly in past and it can be reason why your previous (1 week old) submission get missed from timeout.
We can absolutely check why it is taking >8 hours instead of ~10 minutes on local. Can you help me with following:
The local run is with GPU? I can check if your code is utilising GPU (when allocated) or running only on CPU for whatsoever reason.
What are the number of images when you are doing locally? The server/test dataset have 32428 images to be exact, which may be causing higher time.
I think specs for online environment would help a bit in case there is significant difference from your local environment: 4 vCPUs, 16 GB memory, K80 GPU (when enabled)
Hello @shivam!
Me as @ValAn I feel this kind of information (resources, time limit, and so on) should be better documented, not just for this challenge but for every challenge you guys host.
Also it seems you are constantly improving your platform, which is something great, but as a user I don’t know when you guys do it, nor what do you update, and when I get some inconsistency with previous experience I become nuts about which point I’m missing.
That being said, I think the timing issue is more related to the time for accessing the images in disk rather than the GPU time, which is kind of… sad.
I’ll definitely write more about that when this round is over.
Thanks for the suggestions.
I completely agree that we need to improve our communication & orientation of information for providing seamless experience to participants.
We would be glad to hear back from you after competition and looking forward for the inputs.
I checked all the submissions and unfortunately multiple participants are facing same issue i.e. GPU is being allocated but not used by submissions, due to cuda version mismatch.
For making GPU work out of box, we have introduced force installation as below in our snakes challenge evaluation process:
conda install cudatoolkit=10.0
This should fix the timing issues and we will continue monitoring all the submissions closely.
@ignasimg I have verified disks performance and it was good. Unfortunately on debugging, I found your submission faced same issue i.e. cudatoolkit=10.1 due to which it may have given the impression that disk is the bottleneck (but it was GPU which wasn’t being utilised). The current submission should finish much sooner after condatoolkit version pinning.
Congrats to all. It’s been 3 days since the competition finished and we haven’t got any info how we shall proceed from here. What happens next @devops?
There is no update right now. Organisers will be reaching out to the participants shortly with details about their travel grants, etc and post challenge follow-up.