Have you been experienced frustration when trying to submit your solution to this competition using your favorite framework - Keras? Some might have experienced missing GPU problems, and others might get worse scores than expected. What actually happened? Does this competition hate you because of using something other than PyTorch?
Here I will cover two problems that tensorflow/keras users would experience when submitting your solutions.
1. Your code is not running on GPUs
While others folks can just directly set
gpu:true in the
aicrowd.json file and the GPU will automatically work,
tf.keras users need one additional step.
You have to specify
tensorflow==2.3.0 in the
requirements.txt file. In addition, if you want to save your model into
HDF5 format (
*.h5), you must add
h5py==2.10.0, or else you would get some errors while loading your model from checkpoints.
2. Your code got 0.6~0.7 accuracy when running locally, but you only got 0.2~0.4 in the leaderboard.
If you are using the
flow_from_dataframe (the most suitable method for the file structures in this dataset), when predicting the test data, remember to:
- Sort the DataFrame by filename before passing that DataFrame into the function.
- Call reset() before iterating through the generator
- Match the index back to the original DataFrame using the
argsort()function (see code below)
# sort the dataframe df = test_dataset.labels_df.copy().sort_values('filename').reset_index(drop=True) # creating the ImageDataGenerator for prediction phase image_gen = ImageDataGenerator().flow_from_dataframe(df, test_dataset.images_dir, x_col='filename', shuffle=False, batch_size=self.BATCH_SIZE) # call reset before prediction image_gen.reset() # now the real prediction predictions =  steps = int(np.ceil(test_dataset.labels_df.shape/self.BATCH_SIZE)) for i in range(steps): batch_res = self.model.predict(next(image_gen)) predictions.extend(batch_res) predictions = (np.array(predictions) > 0.5).astype(float) # finally match the index back to their data predictions = predictions[test_dataset.labels_df.sort_values('filename').index.argsort()]
Hope this helps tensorflow/keras users who are uncomfortable with switching platforms due to the close deadline.
PS: Don’t worry about the pretrained models; make something that works first. I got 0.7 on leaderboard training the model from scratch (no purchases).
PPS: Feel free to tell me if I’m missing something.