Submissions are quite unstable

It looks the evaluation of a submission is quite unstable, same code submitted twice either fail or succeed. It should be related to cluster workload and 1 second requirement check. The same image that took 800ms in one submission might take 1200ms in another.

Example below:

Not sure how the organizers could make it better but it make things more difficult to us.

Not sure how the performance is measured, but if it’s like “NO image is allowed to take longer than 1sec” it could be relaxed to “ON AVERAGE no image is allowed to take longer than 1sec”

2 Likes

Totally agree on that average would be much better. You can verify the 1s per image assumption by adding/checking some logs during the Validation stage.

@MPWARE @tfriedel the time is measured per image. Upon request from the organizer, we have increased this number to 2 seconds per image.

@MPWARE , would it be possible to give a link to your submissions where same image got different times, as its not expected to vary based on cluster workload, your submission container gets the full node.

1 Like

@dipam Sure, here are the 2 submissions:

Hi @dipam,

These submissions failed the validation step due to timeout, could you let me know how fast was each iteration? :slight_smile:

  • 237048
  • 237047
  • 237046

I am trying to simulate 2 CPU constraints by setting torch.set_num_threads(2) and in my local setup I can process 2 to 3 images per second.

I think it’s pretty tricky to get this right. Besides some indeterminism / caching issues that naturally occur, on cloud instances you additionally have to face things like noisy neighbors or “steal time”.
see Understanding CPU Steal Time - when should you be worried? | Scout APM Blog
While you say the container gets the full node, it’s not quite clear if that means it get’s the full bare metal server. You are probably using EC2 instances with 2 cores, which are VMs on a bigger machine and thus you have to deal with the problems mentioned.
Increasing time to 2 sec doesn’t solve the problem, as people may just deploy bigger models and then run over the limit again. Imo only averaging can prevent the issue.

3 Likes

@dipam Any status on the 2 submissions I provided?
With the new 2 seconds limit, I can see my model (executed on a dummy 4000x3000 image) that a takes 1125ms, and that succeeds MosquitoalertValidation step but that fails MosquitoalertPrediction step after a few moment. For instance:

I agree with @tfriedel that average after final evaluation would be better.

Hi, I am having a similar problem too. My inference is around 900-1200ms based on the aspect ratio of the input image. I can understand %20 deviation in inference time but almost %100 speed difference is I think too much. I think @tfriedel idea of averaging is a good proposal.

2 Likes