I’m having trouble on using Tensorflow 2.4.1.
Error message:
Could not load dynamic library ‘libcudart.so.11.0’
Submission ID:
119710
I’m having trouble on using Tensorflow 2.4.1.
Error message:
Could not load dynamic library ‘libcudart.so.11.0’
Submission ID:
119710
Hi @sho and welcome to the forums
Unfortunately 2.4 is not supported. Could you downgrade to 2.3?
Hi alfarzan,
Thank you for your answering.
I tried TensorFlow version 2.3.0 and 2.3.2, but I got the same error in both version.
Error message:
Could not load dynamic library ‘libcudart.so.10.1’
Submission ID:
119955 (tensorflow==2.3.0)
119949 (tensorflow==2.3.2)
Hi @sho
I’ve looked a bit more deeply into this and I think I know what’s going on
Your full error message is:
W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "predict.py", line 30, in <module>
model = load_model(submission_config["model_path"])
File "/home/aicrowd/load_model.py", line 6, in load_model
return Model(model_path)
File "/home/aicrowd/utils.py", line 42, in __init__
self.load(model_path)
File "/home/aicrowd/utils.py", line 83, in load
self.model = tf.keras.models.load_model(model_file)
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 186, in load_model
loader_impl.parse_saved_model(filepath)
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/saved_model/loader_impl.py", line 110, in parse_saved_model
raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
OSError: SavedModel file does not exist at: model/model1/{saved_model.pbtxt|saved_model.pb}
Since we don’t have GPUs on the servers you can ignore that initial error.
In your case the real error is the OSError
at the very end. Your saved model is not found.
I can see that you’re making a submission through a colab notebook. Could you please:
MODEL_OUTPUT_PATH
with the name of your saved_model object? (See attached image)Let me know if this doesn’t solve the issue and we will dig in further
Hi alfarzan,
Thank you for your information.
I modified the model file and it worked fine.
Hi alfarzan,
Could you please look at submission #122698? I’m getting the similar errors … although I think the issue lies with how I’ve attempted to zip and unzip my model components.
Many thanks,
Tom
Hi @tom_snowdon
So it seems everything is done correctly except that you have this model.zip
file that I think is supposed to contain all the necessary models (?) but actually that compressed file is empty (only 22 bytes). Maybe fixing that would solve the issue?
If you’re going through the trouble of doing it this way, it might be a lot easier to go through the zip
submission path and just put the models inside a directory that you submit inside your zip
submission.
But I understand that if you have it working this way then it’s easier to keep pushing and make it work properly
Thanks, that’s got me past a slight inaccuracy with folder paths, but I think I’m now hitting a genuine issue with loading back the tensorflow models. Could you please give me your views on the issues with # 122718? I’ve tested loading back the models (unzipping to a different folder than the one used to zip stuff up) and everything loads and functions as expected in collab.
Edit: I can see that “print(tf.__ version__)” yields 2.4.1 in my collab notebook so that could be the issue? (although I am specifying ‘tensorflow==2.3’ in the config, so not sure how to correct this)
Hi @tom_snowdon
I haven’t looked at the submission yet, but since you mention that the colab version is 2.4, that is likely the issue. Can you try and downgrade the version and see if that solves the issue?
If it doesn’t then I’ll look deeper and we’ll figure it out
Thanks, that forced me to pick between the latest 1.x version or 2.4. However, I’ve got a working submission using pip install. I’m sure I’ve not used remotely best practice, but I’ll write a post on what I’ve done so there’s a framework people can copy (and critique).