My latest submissions (133671) are being rejected but I can’t find any sensible reason why. It failed in the “generate predictions” step and here are the logs I get:
Selecting runtime language: python
[NbConvertApp] Converting notebook predict.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python
[NbConvertApp] Writing 17040 bytes to predict.nbconvert.ipynb
Any idea what went wrong? Is it a timeout issue or out-of-memory error of some kind?
Shared the traceback with you privately.
Hi @jyotish and thank you for your help!
For anyone interested, it seems to be an out-of-memory issue because the evaluation pods have only 4Go of RAM.
I think it would make sense to have the same hardware capabilities on the training and evaluation machines, as you need to be able to load your models/assets on both right? It feels weird to me to be allowed to develop with 14Go but to test only with 4Go. What do you think?
We upgraded the evaluation pods with 4 vCPUs and 16 GB RAM to match the specs from the AD workbench. We are re-evaluating the submissions that might have failed due to this.
In case anyone’s submission failed with the message “Inference failed” and the logs say something like “kernel died”, please let us know.
Great news! Thanks for acting so quickly!