Can I have an example of a code which is working to make a submission on gitlab?

@amapic in the sample submission from starter kit you can find:

AICROWD_TEST_IMAGES_PATH = os.getenv("AICROWD_TEST_IMAGES_PATH", "./data/test_images_small/")
AICROWD_TEST_METADATA_PATH = os.getenv("AICROWD_TEST_METADATA_PATH", "./data/test_metadata_small.csv")
AICROWD_PREDICTIONS_OUTPUT_PATH = os.getenv("AICROWD_PREDICTIONS_OUTPUT_PATH", "random_prediction.csv")

as @ValAn told you, it’s better if you don’t change the defaults. But if you still need to change, make sure to just change the second parameter from the call to os.getenv.

This is because when you submit your code aicrowd expects you to “read” those paths from environment variables they’ve set.

For you to test that it works on your local machine it should be enough with the default values and uncompressing both test_metadata_small.tar.gz and test_images_small.tar.gz in the data folder. You can download both of those files in the resources page


As for dealing with corrupt files you can see how @gokuleloop did it for round 2 @ https://github.com/GokulEpiphany/contests-final-code/blob/master/aicrowd-snake-species/inference/run.py#L196
Disclaimer: It’s impossible to use his same idea, due to now we don’t have a sample submission .csv file, but you can get an idea of how to deal with those.

Personally what I do is simply generate a “fake” random image, but I guess there are better ways (more efficient / scoring higher) my pseudocode would be like:

try :
    image = read(file)
except :
    image = random

Final tip: Be sure to add a line with a corrupt / non-existent image file to the test_metadata_small.csv mentioned earlier, so you can also be sure your code can handle errors when reading the images.

Best of luck! :slight_smile:

1 Like

Thank you. Can you give me a yml file with keras and tensorflow 1 ?

I don’t use Keras nor Tensorflow, but if you are using conda - which you totally should (not just because it will make dependency management way easier but also because it’s a piece of software easy to use and which actually works straightforward) - it’s as easy as having your environment activated typing:

$ conda env export > environment.yml

Please use
conda env export --no-build > environment.yml
Also, Inference happens on a K80 (if you enable GPU). Make sure CUDA version is 10.0 and not 10.1

2 Likes

Why it needs to be 10.0? I don’t understand. TBH, I am not sure that organizers enabled GPU for this comp?
@shivam @ashivani @mohanty is there a gpu allocated or not?

A relevant discussion.

1 Like

Hi participants, @ValAn,

Yes the GPUs are available on snakes challenge submissions when gpu: true is done in aicrowd.json.

It need to be 10.0 because nodes on which your code run has GKE version 1.12.x currently -> Nvidia driver 410.79 (based on) -> cuda 10.0 (based on).

We are looking forward to have future challenges on higher CUDA version (GKE version). But to keep consistency in results, timings, etc we do not want to change versions mid-way of contest.

I apologize for overlooking this. Slow evaluation drove me crazy as I mentioned earlier in this discussion.

Now I wonder how I am supposed to know this?

Am I supposed to read through previous competitions to understand how to submit?

Also I really think you should add edit history for your challenge description? Two months ago I read it for this challenge and now I see it’s changed. Nothing important, you updated number of images which were originally just copy pasted from stage 2. I hope you will not take my comments as an offense, I am just trying to understand, share my experience and give some suggestions how to make it easier to participate.

3 Likes

Dear @ValAn,

Our sincere apologies for the inconveniences faced by you.

Regarding the slow evaluation speeds, given that we have to execute your code (and models etc) on a large number of test images, the evaluations are indeed slow. Your model has to make predictions for a large number of images. We are trying to improve this experience by providing better feedback in terms of progress etc, and will definitely address this in the upcoming version of the challenges.

Regarding the competition, we are providing all updates on this forum here, and we would be happy to answer any and all questions you have here. We are also working on better notification systems so that you get relevant updates from the challenge over emails and other notification channels on the platform that you subscribe to.

In the meantime, we really appreciate your feedback. Your feedback helps us make the platform much better for thousands of other users on the platform, and under no circumstances we take it as an offense.

Thank You,
Mohanty
(on behalf of the organizing team)

1 Like

How do you mix the yml coming from conda env export --no-build > environment.yml and the inital yml file coming from the starting pack ?

@amapic: If you built the conda env drom the initial environment.yml file, then conda env export --no-build will export the updated state of the environment.

@mohanty I did so and I can’t find matching versions for those packages :

  • libedit=3.1.20181209
  • readline=7.0
  • ncurses=6.1
  • libgcc-ng=9.1.0
  • libstdcxx-ng=9.1.0
    How to deal with it ?

@amapic This is happening as these packages are only available for linux distribution, due to while installing them in windows (I assume you are using windows) is failing. This is unfortunately a limitation currently with conda.

Example:
https://anaconda.org/anaconda/ncurses, have only osx & linux builds but not windows

In such scenario, I will recommend getting rid of above packages from environment.yaml and continue your conda env creation. These packages are often included being dependencies of “main” dependencies, conda should resolve similar package for your system automatically.

@devops @shivam what does the timeout mean? Anyone knows where I can find this information. I have asked this question numerous times after @devops commented my failed subs but they were ignored so I am bring it up here.

How am I supposed to debug Timeout? Some of my successful subs took longer to execute than most of those which failed because of timeout. I couldn’t come up with reasonable explanation for such behaviour. I hope you can help me to understand this.

Hi @ValAn,

The submissions ideally should take few hours to run but we have put hard timeout as 8 hours. In case your solution is crossings 8 hours it is marked failed.

According to you how much time your code should run roughly? Is it way too off in local v/s during evaluation phase?

Otherwise you can include GPU (if not doing right now) to speed up computation and finish the evaluation under 8 hours.

Please let us know in case you require more help with debugging your submission. We can try to see which step/part of code is taking higher time if required.

I don’t manage to sub and I don’t have time left for this competition for the moment. Can you let the evaluation working after tthe 17 ? I would like to add a line on my resume about this competition.

Hi @amapic, let me get back on this after confirming with organisers.

Meanwhile we can create new questions instead of following up on this thread, it will make QnA search for future simpler. :sweat_smile:

How come some of my subs took 14h and didn’t fail if the limit is 8h? Then again, how am I supposed to know that timeout is set to 8h? Where is it written? I also thought for a moment that you keep changing the timeout limit? Can you confirm that this is not true?

inferencing time is way off. Locally my model on 1080ti takes ~10 minutes to execute so obviously it runs on CPU when submitted.

@amapic stay tuned for stage 4 :slight_smile:

@ValAn No, I can confirm the timeouts haven’t been change b/w your previous and current runs. The only issue has been timeout wasn’t implemented properly in past and it can be reason why your previous (1 week old) submission get missed from timeout.

We can absolutely check why it is taking >8 hours instead of ~10 minutes on local. Can you help me with following:

  • The local run is with GPU? I can check if your code is utilising GPU (when allocated) or running only on CPU for whatsoever reason.
  • What are the number of images when you are doing locally? The server/test dataset have 32428 images to be exact, which may be causing higher time.

I think specs for online environment would help a bit in case there is significant difference from your local environment: 4 vCPUs, 16 GB memory, K80 GPU (when enabled)