Tensorflow/Keras folks, you are not being left behind in this competition

huynhngoc · February 18, 2022, 6:59pm

Have you been experienced frustration when trying to submit your solution to this competition using your favorite framework - Keras? Some might have experienced missing GPU problems, and others might get worse scores than expected. What actually happened? Does this competition hate you because of using something other than PyTorch?

Here I will cover two problems that tensorflow/keras users would experience when submitting your solutions.

1. Your code is not running on GPUs
While others folks can just directly set gpu:true in the aicrowd.json file and the GPU will automatically work, tf.keras users need one additional step.

You have to specify tensorflow==2.3.0 in the requirements.txt file. In addition, if you want to save your model into HDF5 format (*.h5), you must add h5py==2.10.0, or else you would get some errors while loading your model from checkpoints.

2. Your code got 0.6~0.7 accuracy when running locally, but you only got 0.2~0.4 in the leaderboard.
If you are using the ImageDataGenerator to flow_from_dataframe (the most suitable method for the file structures in this dataset), when predicting the test data, remember to:

Sort the DataFrame by filename before passing that DataFrame into the function.
Call reset() before iterating through the generator
Match the index back to the original DataFrame using the argsort() function (see code below)

# sort the dataframe
df = test_dataset.labels_df.copy().sort_values('filename').reset_index(drop=True)
# creating the ImageDataGenerator for prediction phase
image_gen = ImageDataGenerator().flow_from_dataframe(df, test_dataset.images_dir,
            x_col='filename', shuffle=False, batch_size=self.BATCH_SIZE)
# call reset before prediction
image_gen.reset()
# now the real prediction
predictions = []
steps = int(np.ceil(test_dataset.labels_df.shape[0]/self.BATCH_SIZE))
for i in range(steps):
    batch_res = self.model.predict(next(image_gen))
    predictions.extend(batch_res)
predictions = (np.array(predictions) > 0.5).astype(float)
# finally match the index back to their data
predictions = predictions[test_dataset.labels_df.sort_values('filename').index.argsort()]

Hope this helps tensorflow/keras users who are uncomfortable with switching platforms due to the close deadline.

PS: Don’t worry about the pretrained models; make something that works first. I got 0.7 on leaderboard training the model from scratch (no purchases).
PPS: Feel free to tell me if I’m missing something.

moto · February 19, 2022, 3:45pm

@huynhngoc : Many thanks. I did not know that we could access labels_df.