Error on TensorFlow

sho · February 6, 2021, 10:08am

I’m having trouble on using Tensorflow 2.4.1.

Error message:
Could not load dynamic library ‘libcudart.so.11.0’

Submission ID:
119710

alfarzan · February 6, 2021, 12:28pm

Hi @sho and welcome to the forums

Unfortunately 2.4 is not supported. Could you downgrade to 2.3?

sho · February 7, 2021, 9:25am

Hi alfarzan,

Thank you for your answering.
I tried TensorFlow version 2.3.0 and 2.3.2, but I got the same error in both version.

Error message:
Could not load dynamic library ‘libcudart.so.10.1’

Submission ID:
119955 (tensorflow==2.3.0)
119949 (tensorflow==2.3.2)

alfarzan · February 7, 2021, 11:01am

Hi @sho

I’ve looked a bit more deeply into this and I think I know what’s going on

Your full error message is:

W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "predict.py", line 30, in <module>
    model = load_model(submission_config["model_path"])
  File "/home/aicrowd/load_model.py", line 6, in load_model
    return Model(model_path)
  File "/home/aicrowd/utils.py", line 42, in __init__
    self.load(model_path)
  File "/home/aicrowd/utils.py", line 83, in load
    self.model = tf.keras.models.load_model(model_file)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 186, in load_model
    loader_impl.parse_saved_model(filepath)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/saved_model/loader_impl.py", line 110, in parse_saved_model
    raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
OSError: SavedModel file does not exist at: model/model1/{saved_model.pbtxt|saved_model.pb}

Since we don’t have GPUs on the servers you can ignore that initial error.

In your case the real error is the OSError at the very end. Your saved model is not found.

I can see that you’re making a submission through a colab notebook. Could you please:

Make sure that you fill in the MODEL_OUTPUT_PATH with the name of your saved_model object? (See attached image)

image916×278 42 KB
If you have multiple saved files then follow the advice on this thread.

Let me know if this doesn’t solve the issue and we will dig in further

sho · February 13, 2021, 6:25am

Hi alfarzan,

Thank you for your information.
I modified the model file and it worked fine.

tom_snowdon · February 21, 2021, 2:59pm

Hi alfarzan,

Could you please look at submission #122698? I’m getting the similar errors … although I think the issue lies with how I’ve attempted to zip and unzip my model components.

Many thanks,
Tom

alfarzan · February 21, 2021, 4:16pm

Hi @tom_snowdon

So it seems everything is done correctly except that you have this model.zip file that I think is supposed to contain all the necessary models (?) but actually that compressed file is empty (only 22 bytes). Maybe fixing that would solve the issue?

If you’re going through the trouble of doing it this way, it might be a lot easier to go through the zip submission path and just put the models inside a directory that you submit inside your zip submission.

But I understand that if you have it working this way then it’s easier to keep pushing and make it work properly

tom_snowdon · February 21, 2021, 5:42pm

Thanks, that’s got me past a slight inaccuracy with folder paths, but I think I’m now hitting a genuine issue with loading back the tensorflow models. Could you please give me your views on the issues with # 122718? I’ve tested loading back the models (unzipping to a different folder than the one used to zip stuff up) and everything loads and functions as expected in collab.

Edit: I can see that “print(tf.__ version__)” yields 2.4.1 in my collab notebook so that could be the issue? (although I am specifying ‘tensorflow==2.3’ in the config, so not sure how to correct this)

alfarzan · February 21, 2021, 6:51pm

Hi @tom_snowdon

I haven’t looked at the submission yet, but since you mention that the colab version is 2.4, that is likely the issue. Can you try and downgrade the version and see if that solves the issue?

If it doesn’t then I’ll look deeper and we’ll figure it out

tom_snowdon · February 21, 2021, 7:14pm

Thanks, that forced me to pick between the latest 1.x version or 2.4. However, I’ve got a working submission using pip install. I’m sure I’ve not used remotely best practice, but I’ll write a post on what I’ve done so there’s a framework people can copy (and critique).