Submissions taking too long

Are submissions currently stuck? My submission has been waiting in queue for evaluation for over an hour (usually it’s a couple of minutes).

EDIT: the submission went through eventually.

EDIT 2: new submissions seem to be stuck again, this time frozen for over two hours currently.

15 hours and the submissions are still waiting in queue for evaluation.

@shivam @mohanty - is there a server-side issue that’s causing the stuck submissions?

Hi @simon_mezgec,

We had issue in submissions queue due to which submissions got stuck.

We have manually cleaned ongoing submissions – which got stuck and re-queued them now. (to be exact: 65632, 65262, 65404, 65411).

Please let us know in case any other submission ID is stuck for you.

Thanks @shivam!

My two submissions (65636 and 65637) got unstuck and finished successfully.

However, my new submission (65790) appears to also be stuck, so if you could get it unstuck as well, I would appreciate it. :slight_smile:

Hi @simon_mezgec,

Sorry for the trouble. The submission 65790 is on it’s way to evaluation too now. :smiley:

I will keep a close eye for the new submissions, to make sure this isn’t repeating.

1 Like

Awesome, much appreciated @shivam!

@shivam Submission 65948 seems to have failed but I don’t think it should have (similarly to my two submissions yesterday). Can you check it out?

Thanks!

Hi @simon_mezgec, your submission has been processed properly now, and I have made post about the error here.

Fantastic - I figured it was some kind of system-wide error related to Docker. Thanks for sorting it out! :+1:

@shivam - encountered a new error (submission 66390) and I think it’s the server again.

By the way, sorry for pinging you here as well - didn’t know where you prefer it (here vs. GitLab).

Hi,

No worries. You can ping me at either place.

It isn’t happening due to server side this time.

The issue is happening when Dockerfile is trying to install mmdetection package. I think it is due to any new release of package it is dependent on (or similar). I am trying to debug it on my side and inform as soon as I find fix for your Dockerfile.

https://gitlab.aicrowd.com/simon_mezgec/food-recognition-challenge-starter-kit/snippets/20588#L1854

Hi @simon_mezgec,

The issue is fixed now and you should be able to make submission. Please remember to pull latest commit from mmdetection starter kit.

Explaination:

This basically happened because mmcv had a new release 0.5.2 ~7 hours back from now.

And mmdetection has requirement of/pinned to latest release of mmcv

Due to this mmdetection installation start failing. I have pinned mmcv version to 0.5.1 in starter kit now. https://gitlab.aicrowd.com/nikhil_rayaprolu/food-pytorch-baseline/commit/84eadc1ca353b5741423e0e1ea9f8db5d4bfd49f

Following this, submissions using this starter kit will go through as usual.
Thanks for notifying the issue to us!

Ah, interesting - good catch!

Uploaded another submission (66451) and the image was built successfully without a hitch upon adding the mmcv version requirement like you suggested. Will re-upload the same model from submission 66390 later today.

Thanks a lot - really appreciate the quick fix for this! :slight_smile:

1 Like

@shivam - I think the submissions might be stuck again. My submission (67214) has been waiting in queue for evaluation for almost an hour now.

Hey @simon_mezgec

The evaluation for the submission has already started.

1 Like

Evaluation for my submission is taking a long time. I had given debug=true. Can someone tell me if there is a problem in the inference script from my end. @shivam @nikhil_rayaprolu

My new submission (67274) is very slow as well - over one hour in the evaluation queue. There seems to be an issue with the submissions today.

Hi @simon_mezgec,

Your submissions 67274 went without any problem as far as I see. While, 67214 took longer because existing VMs were already busy in evaluating other submissions. We didn’t considering surge in submissions just before the Round end and I have increased parallel submissions to be evaluated (from 4 to 8) which should keep the queue clear.

I hope it helps.

1 Like

Hi @naveen_narayanan,

I see all the submissions made by you have failed either due to image build failure caused due to improper Dockerfile or due to exceptions in your code.

I can ignore the failed submissions from your daily count so you can make submission right now (given it is last few hours left), but considering those for final leaderboard or not, will be a decision made by challenge organisers later.

Please go ahead and make a submission!

1 Like

Thanks to both @shivam and @ashivani - my subsequent submissions went through without any problems.

1 Like