My #210059 failed with “Evaluation timed out”, but during 2/2 of validation dataset and 9/27 of test dataset “inference speed” was 0.926x and 0.913x respectively. But minimal inference speed is 0.637x, right?
It happened to me as well, it timed out after 21/27 at 4.028x speed, for the exact same model as my previous successful submissions. But, I submitted the same model after waiting for a few hours and it was successful.
Today my new submissions were successfully evaluated, but unexpected failures waste attempts and can hinder the participants on the last day of the challenge.
@StefanUhlich@dipam
I think a separate counter for unsuccessful submissions should be enabled.
My submission also failed after 2.5hrs despite having an inference speed of 2.39x… it previously succeeded twice in a row, the only difference was that debug was set to false instead of true within the aicrowd.json.
@alina_porechina , Unfortunately I haven’t found what the issue is on our side. I’m looking into it and it should be resolved soon.
Can you let me know if you expect your model to be consistent in the speed for every song or it can vary? Since the evaluator is checking if every prediction is above the speed constraint.
My understanding is that the speed of my model is constant.
#210197 (2/2 - 0.771x, 3/27 - 0.769x) may have fallen due to low speed.
It is less likely that #210059 fell due to low speed (2/2 - 0.926x, 9/27 - 0.913x).
#210078 and #210117 are identical. #210078 returned “Evaluation timed out” during validation phase. #210117 was evaluated quickly and successfully (2/2 - 3.369x, 27/27 - 3.479x).
I think it’s better to turn on the second counter instead of looking for the causes of rare crashes.
I’ve made some changes to how the cloud instances are provisioned for inference, I think now the failures shouldn’t occur. However if you do observe it again please let me know.
#214686 #214701
Can you please rerun these submissions?
(“Scoring failed. Please contact the admins”)
Today, due to my mistake, 4 submissions failed with “Prediction too slow”. They ran simultaneously. Could this have somehow overloaded the system and caused scoring to fail?