Can I have an example of a code which is working to make a submission on gitlab?

Please use
conda env export --no-build > environment.yml
Also, Inference happens on a K80 (if you enable GPU). Make sure CUDA version is 10.0 and not 10.1

2 Likes

Why it needs to be 10.0? I don’t understand. TBH, I am not sure that organizers enabled GPU for this comp?
@shivam @ashivani @mohanty is there a gpu allocated or not?

A relevant discussion.

1 Like

Hi participants, @ValAn,

Yes the GPUs are available on snakes challenge submissions when gpu: true is done in aicrowd.json.

It need to be 10.0 because nodes on which your code run has GKE version 1.12.x currently -> Nvidia driver 410.79 (based on) -> cuda 10.0 (based on).

We are looking forward to have future challenges on higher CUDA version (GKE version). But to keep consistency in results, timings, etc we do not want to change versions mid-way of contest.

I apologize for overlooking this. Slow evaluation drove me crazy as I mentioned earlier in this discussion.

Now I wonder how I am supposed to know this?

Am I supposed to read through previous competitions to understand how to submit?

Also I really think you should add edit history for your challenge description? Two months ago I read it for this challenge and now I see it’s changed. Nothing important, you updated number of images which were originally just copy pasted from stage 2. I hope you will not take my comments as an offense, I am just trying to understand, share my experience and give some suggestions how to make it easier to participate.

3 Likes

Dear @ValAn,

Our sincere apologies for the inconveniences faced by you.

Regarding the slow evaluation speeds, given that we have to execute your code (and models etc) on a large number of test images, the evaluations are indeed slow. Your model has to make predictions for a large number of images. We are trying to improve this experience by providing better feedback in terms of progress etc, and will definitely address this in the upcoming version of the challenges.

Regarding the competition, we are providing all updates on this forum here, and we would be happy to answer any and all questions you have here. We are also working on better notification systems so that you get relevant updates from the challenge over emails and other notification channels on the platform that you subscribe to.

In the meantime, we really appreciate your feedback. Your feedback helps us make the platform much better for thousands of other users on the platform, and under no circumstances we take it as an offense.

Thank You,
Mohanty
(on behalf of the organizing team)

1 Like

How do you mix the yml coming from conda env export --no-build > environment.yml and the inital yml file coming from the starting pack ?

@amapic: If you built the conda env drom the initial environment.yml file, then conda env export --no-build will export the updated state of the environment.

@mohanty I did so and I can’t find matching versions for those packages :

  • libedit=3.1.20181209
  • readline=7.0
  • ncurses=6.1
  • libgcc-ng=9.1.0
  • libstdcxx-ng=9.1.0
    How to deal with it ?

@amapic This is happening as these packages are only available for linux distribution, due to while installing them in windows (I assume you are using windows) is failing. This is unfortunately a limitation currently with conda.

Example:
https://anaconda.org/anaconda/ncurses, have only osx & linux builds but not windows

In such scenario, I will recommend getting rid of above packages from environment.yaml and continue your conda env creation. These packages are often included being dependencies of “main” dependencies, conda should resolve similar package for your system automatically.

@devops @shivam what does the timeout mean? Anyone knows where I can find this information. I have asked this question numerous times after @devops commented my failed subs but they were ignored so I am bring it up here.

How am I supposed to debug Timeout? Some of my successful subs took longer to execute than most of those which failed because of timeout. I couldn’t come up with reasonable explanation for such behaviour. I hope you can help me to understand this.

Hi @ValAn,

The submissions ideally should take few hours to run but we have put hard timeout as 8 hours. In case your solution is crossings 8 hours it is marked failed.

According to you how much time your code should run roughly? Is it way too off in local v/s during evaluation phase?

Otherwise you can include GPU (if not doing right now) to speed up computation and finish the evaluation under 8 hours.

Please let us know in case you require more help with debugging your submission. We can try to see which step/part of code is taking higher time if required.

I don’t manage to sub and I don’t have time left for this competition for the moment. Can you let the evaluation working after tthe 17 ? I would like to add a line on my resume about this competition.

Hi @amapic, let me get back on this after confirming with organisers.

Meanwhile we can create new questions instead of following up on this thread, it will make QnA search for future simpler. :sweat_smile:

How come some of my subs took 14h and didn’t fail if the limit is 8h? Then again, how am I supposed to know that timeout is set to 8h? Where is it written? I also thought for a moment that you keep changing the timeout limit? Can you confirm that this is not true?

inferencing time is way off. Locally my model on 1080ti takes ~10 minutes to execute so obviously it runs on CPU when submitted.

@amapic stay tuned for stage 4 :slight_smile:

@ValAn No, I can confirm the timeouts haven’t been change b/w your previous and current runs. The only issue has been timeout wasn’t implemented properly in past and it can be reason why your previous (1 week old) submission get missed from timeout.

We can absolutely check why it is taking >8 hours instead of ~10 minutes on local. Can you help me with following:

  • The local run is with GPU? I can check if your code is utilising GPU (when allocated) or running only on CPU for whatsoever reason.
  • What are the number of images when you are doing locally? The server/test dataset have 32428 images to be exact, which may be causing higher time.

I think specs for online environment would help a bit in case there is significant difference from your local environment: 4 vCPUs, 16 GB memory, K80 GPU (when enabled)

Hello @shivam!
Me as @ValAn I feel this kind of information (resources, time limit, and so on) should be better documented, not just for this challenge but for every challenge you guys host.

Also it seems you are constantly improving your platform, which is something great, but as a user I don’t know when you guys do it, nor what do you update, and when I get some inconsistency with previous experience I become nuts about which point I’m missing.

That being said, I think the timing issue is more related to the time for accessing the images in disk rather than the GPU time, which is kind of… sad.
I’ll definitely write more about that when this round is over. :slight_smile:

1 Like

Hi @ignasimg,

Thanks for the suggestions.
I completely agree that we need to improve our communication & orientation of information for providing seamless experience to participants.

We would be glad to hear back from you after competition and looking forward for the inputs.


I checked all the submissions and unfortunately multiple participants are facing same issue i.e. GPU is being allocated but not used by submissions, due to cuda version mismatch.

For making GPU work out of box, we have introduced force installation as below in our snakes challenge evaluation process:

conda install cudatoolkit=10.0

This should fix the timing issues and we will continue monitoring all the submissions closely.


@ignasimg I have verified disks performance and it was good. Unfortunately on debugging, I found your submission faced same issue i.e. cudatoolkit=10.1 due to which it may have given the impression that disk is the bottleneck (but it was GPU which wasn’t being utilised). The current submission should finish much sooner after condatoolkit version pinning.

@shivam Thanks for your explanation. Do you know which day you force installation of cuda 10.0 ? It could explain some problem I had.