Submissions are quite unstable

MPWARE · August 28, 2023, 8:58am

It looks the evaluation of a submission is quite unstable, same code submitted twice either fail or succeed. It should be related to cluster workload and 1 second requirement check. The same image that took 800ms in one submission might take 1200ms in another.

Example below:

Not sure how the organizers could make it better but it make things more difficult to us.

tfriedel · August 28, 2023, 10:41am

Not sure how the performance is measured, but if it’s like “NO image is allowed to take longer than 1sec” it could be relaxed to “ON AVERAGE no image is allowed to take longer than 1sec”

MPWARE · August 28, 2023, 11:45am

Totally agree on that average would be much better. You can verify the 1s per image assumption by adding/checking some logs during the Validation stage.

dipam · August 31, 2023, 1:32pm

@MPWARE @tfriedel the time is measured per image. Upon request from the organizer, we have increased this number to 2 seconds per image.

@MPWARE , would it be possible to give a link to your submissions where same image got different times, as its not expected to vary based on cluster workload, your submission container gets the full node.

MPWARE · August 31, 2023, 3:41pm

@dipam Sure, here are the 2 submissions:

Failed (id=23): AIcrowd
Succeed (id=24): AIcrowd

hca97 · August 31, 2023, 7:49pm

Hi @dipam,

These submissions failed the validation step due to timeout, could you let me know how fast was each iteration?

237048
237047
237046

I am trying to simulate 2 CPU constraints by setting torch.set_num_threads(2) and in my local setup I can process 2 to 3 images per second.

tfriedel · September 1, 2023, 4:00pm

I think it’s pretty tricky to get this right. Besides some indeterminism / caching issues that naturally occur, on cloud instances you additionally have to face things like noisy neighbors or “steal time”.
see Understanding CPU Steal Time - when should you be worried? | Scout APM Blog
While you say the container gets the full node, it’s not quite clear if that means it get’s the full bare metal server. You are probably using EC2 instances with 2 cores, which are VMs on a bigger machine and thus you have to deal with the problems mentioned.
Increasing time to 2 sec doesn’t solve the problem, as people may just deploy bigger models and then run over the limit again. Imo only averaging can prevent the issue.

MPWARE · September 5, 2023, 10:11pm

@dipam Any status on the 2 submissions I provided?
With the new 2 seconds limit, I can see my model (executed on a dummy 4000x3000 image) that a takes 1125ms, and that succeeds MosquitoalertValidation step but that fails MosquitoalertPrediction step after a few moment. For instance:

I agree with @tfriedel that average after final evaluation would be better.

hca97 · September 6, 2023, 2:41pm

Hi, I am having a similar problem too. My inference is around 900-1200ms based on the aspect ratio of the input image. I can understand %20 deviation in inference time but almost %100 speed difference is I think too much. I think @tfriedel idea of averaging is a good proposal.

saidinesh_pola · October 9, 2023, 2:32am

Do we have any solutions for this submission failure at prediction step other than waiting for some cooling period?

MPWARE · October 9, 2023, 6:56am

Hi, not on my side. I submit the same code twice and usually it works when I’m quite sure I’m under 1200ms. Everything above 1200ms fails.

saidinesh_pola · October 9, 2023, 7:20am

Thanks, so unofficially it is ~1.2seconds