There seems to be a problem that the session ends as soon as I submission.
There was no problem when I tested the local, but there was no error left in the log, so I can’t see what the problem is.
I just had a submission error as well. Stopped in the middle of training (after 5 million timesteps). Still under the 2 hour time limit and no out of memory errors as far as I can tell. Interestingly, it did move on to the rollout phase and give me a score.
I waited for the opportunity to submit to you to be restored. I resubmitted the same model, and this time the training began normally.
It would be nice if we were given a way to check if there was an error in advance.
Hello @Paseul1
As mentioned in this comment, there was a problem with your interpreter line (shebang) in run.sh
. If this line is not provided, the script is run with the default shell, which happens to be /bin/sh
in the evaluation environment. It worked locally for you because your default shell might be bash
, zsh
, or equivalent. Please let me know if you are referring to a different submission.