Reasons of submission failure

My #210059 failed with “Evaluation timed out”, but during 2/2 of validation dataset and 9/27 of test dataset “inference speed” was 0.926x and 0.913x respectively. But minimal inference speed is 0.637x, right?

After I submitted #210079, my #210078 got stuck.

@dipam could you please look into this?

1 Like

It happened to me as well, it timed out after 21/27 at 4.028x speed, for the exact same model as my previous successful submissions. But, I submitted the same model after waiting for a few hours and it was successful.

1 Like

Today my new submissions were successfully evaluated, but unexpected failures waste attempts and can hinder the participants on the last day of the challenge.

@StefanUhlich @dipam
I think a separate counter for unsuccessful submissions should be enabled.

@alina_porechina @subatomicseer

I’ll investigate it, can I please have the respective submission ids.

1 Like

Take a look at failed #210059 and #210078 (there is no need to restart them). Failed #210078 and successful #210117 are identical.

Mine was #209954. No need to restart it either, but would be nice to know what happened.

I seem to be having a similar issue. Submission: #210149 (music demixing leaderboard C)

@dipam

Mine has been stuck at demixing 3/27 for 2 hours now. I also used up a submission since debug was false in the aicrowd.json

My submission also failed after 2.5hrs despite having an inference speed of 2.39x… it previously succeeded twice in a row, the only difference was that debug was set to false instead of true within the aicrowd.json.

submission_hash : 81a881bc88d002c09e600b5496fb1c98f59d3ad6.

@dipam

#210197 failed at 3/27, speed 0.769x

@alina_porechina , Unfortunately I haven’t found what the issue is on our side. I’m looking into it and it should be resolved soon.

Can you let me know if you expect your model to be consistent in the speed for every song or it can vary? Since the evaluator is checking if every prediction is above the speed constraint.

@dipam

My understanding is that the speed of my model is constant.

#210197 (2/2 - 0.771x, 3/27 - 0.769x) may have fallen due to low speed.

It is less likely that #210059 fell due to low speed (2/2 - 0.926x, 9/27 - 0.913x).

#210078 and #210117 are identical. #210078 returned “Evaluation timed out” during validation phase. #210117 was evaluated quickly and successfully (2/2 - 3.369x, 27/27 - 3.479x).

I think it’s better to turn on the second counter instead of looking for the causes of rare crashes.

@dipam

Today I successfully resubmitted code from failed #210197 (2/2 - 0.771x, 3/27 - 0.769x) as #210246 (2/2 - 0.757x, 27/27 - 0.749x)

I’ve made some changes to how the cloud instances are provisioned for inference, I think now the failures shouldn’t occur. However if you do observe it again please let me know.

1 Like

#210438
After demixing I got the message: “Scoring failed. Please contact the admins.”

1 Like

#210461
After successful demixing (2/2 - 1.73x, 27/27 - 1.749x) I got the message: “Evaluation timed out”

#210459
Demixing got stuck (2/2 - 0.689x, 6/27 - 0.692x), then I got “Evaluation timed out”

@dipam

Today 3/5 of my submissions failed unexpectedly

@alina_porechina

I’ve rerun the submissions after adding an extra buffer to the timeout. They’re scored now.

1 Like

@dipam

#214686
#214701
Can you please rerun these submissions?
(“Scoring failed. Please contact the admins”)

Today, due to my mistake, 4 submissions failed with “Prediction too slow”. They ran simultaneously. Could this have somehow overloaded the system and caused scoring to fail?