I often see my training jobs stopped without finishing 8M steps or 2h. They don’t fail, and show up as succeeded, but sometimes they run for only 2M or 4M training steps.
In the Docker logs I see the following message
MaxRuntimeExceeded - Training job runtime exceeded MaxRuntimeInSeconds provided
Anyone else having this problem?