Hi,
I’ve made submission # 113515 and it has failed with error code 132. The code runs fine locally and on colab notebooks so I’m not sure what is wrong. Could you tell me what produces this error code please?
Thanks,
Matt
Hi,
I’ve made submission # 113515 and it has failed with error code 132. The code runs fine locally and on colab notebooks so I’m not sure what is wrong. Could you tell me what produces this error code please?
Thanks,
Matt
Hi @tangohead
We are reevaluating all submissions and soon you will have detailed error logs that are privately available on the submissions page
Great news - thank you!
Hi @alfarzan
I checked out the issue causing this, and it was an illegal instruction causing a core dump. I did a bit of digging and it seems like it might be related to Tensorflow requiring AVX support.
Could you confirm if your servers support AVX? I’ve also tried to use an older version of TF (1.5) but the Docker image that the submission server doesn’t appear to support this.
Thanks in advance,
Matt
Hi Matt (@tangohead)
The machines should support any software supported by APT.
Could you please try installing libmkl-dev
and say libmkl-avx2
by applying the same instructions in this post to your required APT packages?
Additionally we’re looking into backend dependencies that are not automatically gathered the package managers to make sure this kind of error doesn’t happen too often. (e.g. with TensorFlow, XGboost etc).
I’ve given this a go but sadly I’ve had no luck. For some reason apt cannot find either libmkl-avx2
or libmkl-dev
. I’ve also tried intel-mkl
just in case but apt can’t locate that either.
Here’s a clip from the Docker log:
Step 8/12 : RUN apt -qq update && apt -qq install -y libmkl-avx2 libmkl-dev && rm -rf /var/lib/apt/*
—> Running in be9b5903f205
e[91m
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
e [0m12 packages can be upgraded. Run 'apt list --upgradable' to see them.
e [91m
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
e [0mPackage libmkl-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
e [91mE: Unable to locate package libmkl-avx2
E: Package 'libmkl-dev' has no installation candidate
Sorry about the formatting, the first bit won’t play ball.
Thanks for all your help on this!
I should have checked earlier but could you please include Tensorflow in your requirements.txt
and try one more time?
That should do the trick
No problem! I think I had that in on all the earlier ones, both those which core dumped and the ones where apt can’t find the package. Will double check now and run again though!
Hello @tangohead
Can you try installing tensorflow==2.3
?
The machine already supports AVX and things should work fine. From TF 2.4, it seems AVX2 is needed which we do not support. As for why TF 1.5 failed, it looks like they were taken off the pypi index (probably becuase of deprecation) and the build probably failed with an error saying No matching distribution found
.
That seems to have sorted it. Many thanks!