Reasons of submission failure

alina_porechina · February 15, 2023, 12:38pm

My #210059 failed with “Evaluation timed out”, but during 2/2 of validation dataset and 9/27 of test dataset “inference speed” was 0.926x and 0.913x respectively. But minimal inference speed is 0.637x, right?

After I submitted #210079, my #210078 got stuck.

GiorgioFabbro · February 15, 2023, 4:50pm

@dipam could you please look into this?

subatomicseer · February 15, 2023, 11:40pm

It happened to me as well, it timed out after 21/27 at 4.028x speed, for the exact same model as my previous successful submissions. But, I submitted the same model after waiting for a few hours and it was successful.

alina_porechina · February 16, 2023, 3:47am

Today my new submissions were successfully evaluated, but unexpected failures waste attempts and can hinder the participants on the last day of the challenge.

@StefanUhlich @dipam
I think a separate counter for unsuccessful submissions should be enabled.

dipam · February 16, 2023, 4:57am

@alina_porechina @subatomicseer

I’ll investigate it, can I please have the respective submission ids.

alina_porechina · February 16, 2023, 6:30am

Take a look at failed #210059 and #210078 (there is no need to restart them). Failed #210078 and successful #210117 are identical.

subatomicseer · February 16, 2023, 6:47am

Mine was #209954. No need to restart it either, but would be nice to know what happened.

MalcolmX · February 16, 2023, 7:30pm

I seem to be having a similar issue. Submission: #210149 (music demixing leaderboard C)

@dipam

Mine has been stuck at demixing 3/27 for 2 hours now. I also used up a submission since debug was false in the aicrowd.json

MalcolmX · February 16, 2023, 8:16pm

My submission also failed after 2.5hrs despite having an inference speed of 2.39x… it previously succeeded twice in a row, the only difference was that debug was set to false instead of true within the aicrowd.json.

submission_hash : 81a881bc88d002c09e600b5496fb1c98f59d3ad6.

alina_porechina · February 17, 2023, 4:47pm

@dipam

#210197 failed at 3/27, speed 0.769x

dipam · February 17, 2023, 6:34pm

@alina_porechina , Unfortunately I haven’t found what the issue is on our side. I’m looking into it and it should be resolved soon.

Can you let me know if you expect your model to be consistent in the speed for every song or it can vary? Since the evaluator is checking if every prediction is above the speed constraint.

alina_porechina · February 17, 2023, 7:56pm

@dipam

My understanding is that the speed of my model is constant.

#210197 (2/2 - 0.771x, 3/27 - 0.769x) may have fallen due to low speed.

It is less likely that #210059 fell due to low speed (2/2 - 0.926x, 9/27 - 0.913x).

#210078 and #210117 are identical. #210078 returned “Evaluation timed out” during validation phase. #210117 was evaluated quickly and successfully (2/2 - 3.369x, 27/27 - 3.479x).

I think it’s better to turn on the second counter instead of looking for the causes of rare crashes.

alina_porechina · February 18, 2023, 7:28am

@dipam

Today I successfully resubmitted code from failed #210197 (2/2 - 0.771x, 3/27 - 0.769x) as #210246 (2/2 - 0.757x, 27/27 - 0.749x)

dipam · February 18, 2023, 7:48am

I’ve made some changes to how the cloud instances are provisioned for inference, I think now the failures shouldn’t occur. However if you do observe it again please let me know.

alina_porechina · February 21, 2023, 3:34pm

#210438
After demixing I got the message: “Scoring failed. Please contact the admins.”

alina_porechina · February 21, 2023, 8:12pm

#210461
After successful demixing (2/2 - 1.73x, 27/27 - 1.749x) I got the message: “Evaluation timed out”

alina_porechina · February 21, 2023, 8:43pm

#210459
Demixing got stuck (2/2 - 0.689x, 6/27 - 0.692x), then I got “Evaluation timed out”

alina_porechina · February 21, 2023, 8:44pm

@dipam

Today 3/5 of my submissions failed unexpectedly

dipam · February 22, 2023, 3:20pm

@alina_porechina

I’ve rerun the submissions after adding an extra buffer to the timeout. They’re scored now.

alina_porechina · April 6, 2023, 6:48pm

@dipam

#214686
#214701
Can you please rerun these submissions?
(“Scoring failed. Please contact the admins”)

Today, due to my mistake, 4 submissions failed with “Prediction too slow”. They ran simultaneously. Could this have somehow overloaded the system and caused scoring to fail?