Suggestion to switch from spot to on-demand

Hi,

Lately it’s been impossible to get a submission to train and evaluate successfully, due to spot instance timeouts. There are always either some trainings not completing to 8M steps, or rollouts failing. On top of that, the submission queue has grown to above 12 hours.

If this continues, with 7 days remaining, I don’t see how it will be possible for all participants to get at least one good submission with their latest version.

I’d like to suggest the following to the organizers:

  • Switch to running on dedicated on-demand instances, not spot instances.
  • Limit to 1 submission a day

As I see it, this way everyone wins. Participants can at least get one good submission a day, rather than spamming 5 submissions, hoping at least one will finish. And the total compute cost shouldn’t increase, because 1 on demand submission cost is about 3 spot submissions.

1 Like

@jurgisp: This is a valid suggestion. We will discuss internally early tomorrow, and adjust the submission quotas for the last week and move to on-demand instances.

We should be able to manage 2 submissions a day until the end of the competitions.

We will post an update on this thread after a decision has been made.

Cheers,
Mohanty

2 Likes

Any updates on this? Looking forward to the change.

Hello @jurgisp

We plan to use on-demand instances for a week with reduced submission quota. We will make an announcement as soon as we do this.

2 Likes