Training Error?

Paseul · July 3, 2020, 12:49am

There seems to be a problem that the session ends as soon as I submission.
There was no problem when I tested the local, but there was no error left in the log, so I can’t see what the problem is.

tim_whitaker · July 3, 2020, 1:02am

I just had a submission error as well. Stopped in the middle of training (after 5 million timesteps). Still under the 2 hour time limit and no out of memory errors as far as I can tell. Interestingly, it did move on to the rollout phase and give me a score.

Paseul · July 3, 2020, 2:03am

I waited for the opportunity to submit to you to be restored. I resubmitted the same model, and this time the training began normally.
It would be nice if we were given a way to check if there was an error in advance.

jyotish · July 3, 2020, 5:32am

Hello @Paseul1

As mentioned in this comment, there was a problem with your interpreter line (shebang) in run.sh. If this line is not provided, the script is run with the default shell, which happens to be /bin/sh in the evaluation environment. It worked locally for you because your default shell might be bash, zsh, or equivalent. Please let me know if you are referring to a different submission.