Baseline Submission for the Challenge FOODC¶
Author - Pulkit Gera
Download the files¶
These include the train test images as well the csv indexing them
!wget -q https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/foodc/v0.1/train_images.zip
!wget -q https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/foodc/v0.1/test_images.zip
!wget -q https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/foodc/v0.1/train.csv
!wget -q https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/foodc/v0.1/test.csv
We create directories and unzip the images
!mkdir data
!mkdir data/test
!mkdir data/train
!unzip train_images -d data/train
!unzip test_images -d data/test
Import necessary packages¶
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader, Dataset
import torchvision
from torchvision import models
import torch.optim as optim
import pandas as pd
import numpy as np
import cv2
import os
from sklearn import preprocessing
import matplotlib.pyplot as plt
%matplotlib inline
Loading Data¶
In pytorch we can directly load our files into torchvision(the library which creates the object) or create a custom class to load data. The class must have __init__
, __len__
and __getitem__
functions. We create a custom dataloader to suit our needs. More info on custom loaders can be read here
class FoodData(Dataset):
def __init__(self,data_list,data_dir = './',transform=None,train=True):
super().__init__()
self.data_list = data_list
self.data_dir = data_dir
self.transform = transform
self.train = train
def __len__(self):
return self.data_list.shape[0]
def __getitem__(self,item):
if self.train:
img_name,label = self.data_list.iloc[item]
else:
img_name = self.data_list.iloc[item]['ImageId']
img_path = os.path.join(self.data_dir,img_name)
img = cv2.imread(img_path,1)
img = cv2.resize(img,(256,256))
if self.transform is not None:
img = self.transform(img)
if self.train:
return {
'gt' : img,
'label' : torch.tensor(label)
}
else:
return {
'gt':img
}
We first convert the data labels into encodings using Label Encoders. This basically converts labels into number encodings. This is an important step as without it we cannot train our network
train = pd.read_csv('train.csv')
le = preprocessing.LabelEncoder()
targets = le.fit_transform(train['ClassName'])
ntrain = train
ntrain['ClassName'] = targets
We load our train data and some necessary augementations like converting to PIL image, converting to tensors and normalizing them across channels. We can add more augementations such as Random Flip
, Random Rotation
, etc more on which can be found here
transforms_train = transforms.Compose([
transforms.ToPILImage(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5,0.5,0.5))
])
train_path = 'data/train/train_images'
train_data = FoodData(data_list= ntrain,data_dir = train_path,transform = transforms_train)
EDA¶
Let us do some exploratory data analysis. The idea is to see the class distribution, how the images are and much more.
train = pd.read_csv('train.csv')
num = train['ClassName'].value_counts()
classes = train['ClassName'].unique()
print("Percentage of each class")
for cl in classes:
print(cl,'\t',num[cl]/train.shape[0]*100,"%")
Percentage of each class water 9.25667703528907 % pizza-margherita-baked 1.179877721763381 % broccoli 0.9009975329829454 % salad-leaf-salad-green 5.738496192212807 % egg 2.2417676713504235 % butter 3.71125174300118 % bread-white 6.382065858629196 % apple 2.0486967714255067 % dark-chocolate 0.9439021774107046 % white-coffee-with-caffeine 1.3085916550466588 % sweet-pepper 0.9009975329829454 % mixed-salad-chopped-without-sauce 1.8127212270728308 % tomato-sauce 1.179877721763381 % cucumber 1.1476992384425615 % cheese 1.4694840716507562 % pasta-spaghetti 1.040437627373163 % rice 2.7458972433765956 % zucchini 0.9653544996245843 % salmon 0.5470342164539311 % mixed-vegetables 2.542100182344739 % espresso-with-caffeine 2.0916014158532663 % banana 1.9414351603561086 % strawberries 0.9331760163037649 % mayonnaise 0.4612249275984125 % almonds 0.740105116378848 % bread-wholemeal 4.269012120562051 % wine-white 1.619650327147914 % hard-cheese 1.2013300439772605 % ham-raw 0.7079266330580285 % tomato 3.8399656762844576 % french-beans 0.8044620830204869 % mandarine 0.740105116378848 % wine-red 2.585004826772498 % potatoes-steamed 1.673281132682613 % croissant 0.8044620830204869 % carrot 3.185669848761128 % salami 0.5255818942400515 % boisson-au-glucose-50g 0.9117236940898853 % biscuits 0.7293789552719082 % corn 0.39686796095677357 % leaf-spinach 0.9331760163037649 % tea-green 0.740105116378848 % chips-french-fries 1.4587579105438164 % parmesan 0.7293789552719082 % beer 0.8580928885551861 % bread-french-white-flour 0.6542958275233294 % coffee-with-caffeine 4.043762737316314 % chicken 1.1369730773356215 % soft-cheese 0.5148557331331117 % tea 1.8985305159283494 % avocado 0.9439021774107046 % bread-sourdough 0.6757481497372091 % gruyere 0.7615574385927276 % sauce-savoury 0.6542958275233294 % honey 0.6972004719510887 % mixed-nuts 0.868819049662126 % jam 1.7483642604311918 % bread-whole-wheat 0.7937359219135471 % water-mineral 0.922449855196825 % onion 0.4397726053845329 % pickle 0.3003325109943151 %
We observe that water is the most popular class although the distribution is not that skewed. Let us plot the images of white flour french bread and french fries and have a look at the kind of images we have
imgs = train.loc[train['ClassName'] == 'bread-french-white-flour']
plt.figure(figsize=(10,10))
for i in range(imgs[:16].shape[0]):
path = imgs.iloc[i]['ImageId']
image = cv2.imread(os.path.join(train_path,path),1)
image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
plt.subplot(4,4,i+1)
plt.axis('off')
plt.imshow(image)
imgs = train.loc[train['ClassName'] == 'chips-french-fries']
plt.figure(figsize=(10,10))
for i in range(imgs[:16].shape[0]):
path = imgs.iloc[i]['ImageId']
image = cv2.imread(os.path.join(train_path,path),1)
image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
plt.subplot(4,4,i+1)
plt.axis('off')
plt.imshow(image)
Split Data into Train and Validation¶
Now we want to see how well our model is performing, but we dont have the test data labels with us to check. What do we do ? So we split our dataset into train and validation. The idea is that we test our classifier on validation set in order to get an idea of how well our classifier works. This way we can also ensure that we dont overfit on the train dataset. There are many ways to do validation like k-fold,leave one out, etc
We also make dataloaders
which basically create minibatches of dataset which are used in each epoch
batch = 128
valid_size = 0.2
num = train_data.__len__()
# Dividing the indices for train and cross validation
indices = list(range(num))
np.random.shuffle(indices)
split = int(np.floor(valid_size*num))
train_idx,valid_idx = indices[split:], indices[:split]
#Create Samplers
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
train_loader = DataLoader(train_data, batch_size = batch, sampler = train_sampler)
valid_loader = DataLoader(train_data, batch_size = batch, sampler = valid_sampler)
Here we load test images. Note: This file will not have any labels with it
transforms_test = transforms.Compose([
transforms.ToPILImage(),
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5) , (0.5,0.5,0.5))
])
test_path = 'data/test/test_images'
test = pd.read_csv('test.csv')
test_data = FoodData(data_list= test,data_dir = test_path,transform = transforms_test,train=False)
test_loader = DataLoader(test_data, batch_size=batch, shuffle=False)
Here we check if we have a GPU or not. If we have we just need to shift our data and model to GPU for faster computations.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)
cuda:0
Define the Model¶
Now we come to the juicy part. We define our model here. We need to create a class with __init__
and forward
functions which define the layers and forward pass respectively. We can also load pretrained models and freeze their layers and add more layers on top of it, to train them. More on pretrained models with pytorch here and making models here.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
# Define layers here
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 61 * 61, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 61)
def forward(self, x):
# Forward pass
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 61 * 61)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Here we define our model object along with our optimizer and error function. Typically for multi class classification we use Cross Entropy Loss
. More about different types of losses are here.
We use the popular Adam optimizer with its default parameters. There are other optimizers like SGD
, RMSPROP
, Adamax
,etc. You can have a detailed look at optimizers here
model = Net().to(device)
error = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam(model.parameters())
Train¶
Alright enough talk and time to train. We define the number of epochs and train the model. An epoch is a forward pass and backward pass of all the data points. An epoch consists of iterations which depend on batch size. So basically we take a batch, get its output, do a backward pass and let the optimizer take a step. This is the workflow for any pytorch code.
Validate¶
Now after an epoch ends, we check with validation and do the same steps except backward pass on loss and optimizer step. If we get a reduction in validation loss, we save the model. This is sort of an early stopping.
n_epochs = 5
valid_loss_min = np.Inf
train_losses = []
valid_losses = []
for epoch in range(n_epochs):
train_loss = 0.0
valid_loss = 0.0
model.train()
for images in train_loader:
data = images['gt'].squeeze(0).to(device)
# data = data.squeeze(0)
target = images['label'].to(device)
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass the model
output = model(data)
# backward pass the model
loss = error(output,target)
loss.backward()
# Perform a single optimization step
optimizer.step()
train_loss += loss.item()*data.size(0)
model.eval()
for images in valid_loader:
data = images['gt'].squeeze(0).to(device)
target = images['label'].to(device)
# forward pass now
output = model(data)
# calculate the branch loss
loss = error(output, target)
# update average validation loss
valid_loss += loss.item()*data.size(0)
train_loss /= len(train_loader.sampler)
valid_loss /= len(valid_loader.sampler)
train_losses.append(train_loss)
valid_losses.append(valid_loss)
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch, train_loss, valid_loss))
if valid_loss <= valid_loss_min:
print("Validation Loss decreased {:0.6f} -> {:0.6f}".format(valid_loss_min,valid_loss))
valid_loss_min = valid_loss
torch.save(model.state_dict(), 'best_model_so_far.pth')
Epoch: 0 Training Loss: 4.116804 Validation Loss: 4.030159 Validation Loss decreased inf -> 4.030159 Epoch: 1 Training Loss: 3.891067 Validation Loss: 3.763175 Validation Loss decreased 4.030159 -> 3.763175 Epoch: 2 Training Loss: 3.794735 Validation Loss: 3.759849 Validation Loss decreased 3.763175 -> 3.759849 Epoch: 3 Training Loss: 3.792089 Validation Loss: 3.758870 Validation Loss decreased 3.759849 -> 3.758870 Epoch: 4 Training Loss: 3.791188 Validation Loss: 3.757242 Validation Loss decreased 3.758870 -> 3.757242
Predict on Validation¶
Now we predict our trained model on the validation set and evaluate our model
model.load_state_dict(torch.load('best_model_so_far.pth'))
model.eval()
correct = 0
total = 0
pred_list = []
correct_list = []
with torch.no_grad():
for images in valid_loader:
data = images['gt'].squeeze(0).to(device)
target = images['label'].to(device)
outputs = model(data)
_, predicted = torch.max(outputs.data, 1)
total += target.size(0)
pr = predicted.detach().cpu().numpy()
for i in pr:
pred_list.append(i)
tg = target.detach().cpu().numpy()
for i in tg:
correct_list.append(i)
correct += (predicted == target).sum().item()
print('Accuracy of the network on the 10000 test images: %f %%' % (
100 * correct / total))
Accuracy of the network on the 10000 test images: 9.388412 %
from sklearn.metrics import f1_score,precision_score,log_loss
print("F1 score :",f1_score(correct_list,pred_list,average='micro'))
F1 score : 0.09388412017167383
Predict on test set¶
Time for the moment of truth! Predict on test set and time to make the submission.
model.load_state_dict(torch.load('best_model_so_far.pth'))
model.eval()
preds = []
with torch.no_grad():
for images in test_loader:
data = images['gt'].squeeze(0).to(device)
outputs = model(data)
_, predicted = torch.max(outputs.data, 1)
pr = predicted.detach().cpu().numpy()
for i in pr:
preds.append(i)
Save it in correct format¶
# Create Submission file
df = pd.DataFrame(le.inverse_transform(preds),columns=['ClassName'])
df.to_csv('submission.csv',index=False)
To download the generated in collab csv run the below command¶
from google.colab import files
files.download('submission.csv')