FOODC Editorial

rohitmidha23 · May 20, 2020, 3:32pm

The Challenge¶

Maintaining a healthy diet is difficult. As the saying goes, the best way to escape a problem is to solve it. So why not leverage the power of deep learning and computer vision to build the foundation of a semi-automated food tracking application?

With over 9300 hand-annotated images with 61 classes, the challenge is to train accurate models that can look at images of food items and detect the food items present in the image.

It's time to unleash the food (data)scientist in you! Given any image, identify the food item present in it.

Downloads and Installs¶

In [0]:

!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/train_images.zip
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/test_images.zip
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/train.csv
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/test.csv

In [0]:

!mkdir data
!mkdir data/test
!mkdir data/train
!unzip train_images -d data/train
!unzip test_images -d data/test

In [0]:

!mkdir models

Imports¶

In [0]:

import sys
import os
import gc
import warnings
import torch

import torch.nn as nn
import numpy as np
import pandas as pd 
import torch.nn.functional as F

from fastai.script import *
from fastai.vision import *
from fastai.callbacks import *
from fastai.distributed import *
from fastprogress import fastprogress
from torchvision.models import *

In [0]:

np.random.seed(23)
torch.cuda.device(0)
warnings.filterwarnings("ignore")
torch.multiprocessing.freeze_support()
print("[INFO] GPU:", torch.cuda.get_device_name())

[INFO] GPU: Tesla P100-PCIE-16GB

DataBunch and Model¶

Here we use a technique called progressive resizing. At each step the model is loaded with weights trained on images of lower sizes.

In [0]:

def get_data(size, batch_size):
  """
  function that returns a DataBunch as needed for the Learner
  """
  train = pd.read_csv("train.csv")
  src = (ImageList.from_df(train, path="data/", folder="train/train_images/").split_by_rand_pct(0.1).label_from_df())
  src.add_test_folder("test/test_images/")
  tfms = get_transforms(do_flip=True, flip_vert=False, max_rotate=10.0, 
                      max_zoom=1.1, max_lighting=0.2, max_warp=0.2, p_affine=0.75, p_lighting=0.75)

  data = (src.transform(
      tfms,
      size=size,
      resize_method=ResizeMethod.SQUISH)
      .databunch(bs=batch_size)
      .normalize(imagenet_stats))
  assert sorted(set(train.ClassName.unique())) == sorted(data.classes), "Class Mismatch"
  print("[INFO] Number of Classes: ", data.c)
  data.num_workers = 4
  return data

In [0]:

sample_data = get_data(32, (2048//32))
sample_data.show_batch(3, 3)

[INFO] Number of Classes:  61

As you can see, the transforms have been applied and the image is normalized as well!

We first initialize all the models.

In [0]:

learn = create_cnn(get_data(32, (2048//32)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')])
learn.model_dir = "models/"
learn.save("densenet_32")

learn = create_cnn(get_data(64, (2048//64)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_32")
learn.model_dir = "models/"
learn.save("densenet_64")

learn = create_cnn(get_data(128, (2048//128)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_64")
learn.model_dir = "models/"
learn.save("densenet_128")

learn = create_cnn(get_data(256, (2048//256)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_128")
learn.model_dir = "models/"
learn.save("densenet_256")

[INFO] Number of Classes:  61

Downloading: "https://download.pytorch.org/models/densenet161-8d451a50.pth" to /root/.cache/torch/checkpoints/densenet161-8d451a50.pth

[INFO] Number of Classes:  61
[INFO] Number of Classes:  61
[INFO] Number of Classes:  61

In [0]:

def train_model(size, iter1, iter2, mixup=False):
  """
  function to quickly train a model for a certain number of iterations.
  """
  size_match = {"256": "128", "128": "64", "64": "32"}
  learn = create_cnn(get_data(size, (2048//size)), models.densenet161, 
                     metrics=[accuracy, 
                              FBeta(beta=1,average='macro')])
  learn.model_dir = "models/"
  if mixup:
    learn.mixup()
  if str(size) != str(32):
    learn.load("densenet_" + str(size_match[str(size)]))

  name = "densenet_" + str(size)
  print("[INFO] Training for : ", name)

  learn.fit_one_cycle(iter1, 1e-4, callbacks=[ShowGraph(learn),
                            SaveModelCallback(learn, monitor='f_beta', mode='max', name=name)])
  learn.unfreeze()
  learn.fit_one_cycle(iter2, 5e-5, callbacks=[ShowGraph(learn),
                            SaveModelCallback(learn, monitor='f_beta', mode='max', name=name)])

Here you might notice the use of a function mixup. mixup is a callback in fastai that is extremely efficient at regularizing models in computer vision.

Instead of feeding the model the raw images, we take two images (not necessarily from the same class) and make a linear combination of them: in terms of tensors, we have:

new_image = t * image1 + (1-t) * image2

where t is a float between 0 and 1. The target we assign to that new image is the same combination of the original targets:

new_target = t * target1 + (1-t) * target2

assuming the targets are one-hot encoded (which isn’t the case in PyTorch usually). And it's as simple as that.

For example:

Source Dog or cat? The right answer here is 70% dog and 30% cat!

In [0]:

train_model(32, 5, 3)

[INFO] Number of Classes:  61
[INFO] Training for :  densenet_32

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	5.436698	4.320179	0.106223	0.053227	01:54
1	4.155217	3.488357	0.257511	0.111412	01:54
2	3.625813	3.116575	0.283262	0.144687	01:55
3	3.403799	3.113646	0.290773	0.148819	01:56
4	3.333214	3.136955	0.293991	0.144410	01:56

Better model found at epoch 0 with f_beta value: 0.05322723090648651.
Better model found at epoch 1 with f_beta value: 0.1114121824502945.
Better model found at epoch 2 with f_beta value: 0.14468735456466675.
Better model found at epoch 3 with f_beta value: 0.14881914854049683.

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	3.269448	2.944852	0.311159	0.151784	02:01
1	3.095446	2.667753	0.329399	0.163058	02:01
2	2.985259	2.677143	0.334764	0.164230	02:02

Better model found at epoch 0 with f_beta value: 0.15178431570529938.
Better model found at epoch 1 with f_beta value: 0.1630583107471466.
Better model found at epoch 2 with f_beta value: 0.1642296463251114.

In [0]:

train_model(64, 5, 4)

[INFO] Number of Classes:  61
[INFO] Training for :  densenet_64

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	3.042036	2.391506	0.375536	0.202430	02:24
1	2.755056	2.175985	0.427039	0.274385	02:23
2	2.513455	2.062872	0.440987	0.286241	02:23
3	2.333173	2.029333	0.448498	0.294666	02:23
4	2.274806	2.010746	0.449571	0.299761	02:23

Better model found at epoch 0 with f_beta value: 0.20242981612682343.
Better model found at epoch 1 with f_beta value: 0.2743850350379944.
Better model found at epoch 2 with f_beta value: 0.286241352558136.
Better model found at epoch 3 with f_beta value: 0.2946656346321106.
Better model found at epoch 4 with f_beta value: 0.2997610867023468.

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	2.224584	2.064080	0.450644	0.308239	02:32
1	2.183188	1.941107	0.477468	0.358477	02:32
2	1.866471	1.893163	0.482833	0.357009	02:33
3	1.833622	1.912134	0.483906	0.363549	02:33

Better model found at epoch 0 with f_beta value: 0.3082387149333954.
Better model found at epoch 1 with f_beta value: 0.3584773540496826.
Better model found at epoch 3 with f_beta value: 0.36354920268058777.

In [0]:

train_model(128, 7, 4, mixup=True)

[INFO] Number of Classes:  61
[INFO] Training for :  densenet_128

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	3.102915	1.607829	0.563305	0.414498	03:27
1	2.943032	1.549630	0.581545	0.438603	03:26
2	2.808276	1.498592	0.587983	0.435788	03:26
3	2.682379	1.481404	0.592275	0.444419	03:27
4	2.538528	1.465215	0.580472	0.441078	03:28
5	2.511207	1.447936	0.597640	0.465081	03:26
6	2.440458	1.438690	0.604077	0.465968	03:25

Better model found at epoch 0 with f_beta value: 0.4144982099533081.
Better model found at epoch 1 with f_beta value: 0.43860334157943726.
Better model found at epoch 3 with f_beta value: 0.44441917538642883.
Better model found at epoch 5 with f_beta value: 0.4650808572769165.
Better model found at epoch 6 with f_beta value: 0.46596816182136536.

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	2.546155	1.477883	0.585837	0.457701	03:43
1	2.494597	1.511773	0.579399	0.443396	03:44
2	2.333117	1.432688	0.595494	0.473695	03:44
3	2.253165	1.432526	0.597640	0.471653	03:43

Better model found at epoch 0 with f_beta value: 0.4577012360095978.
Better model found at epoch 2 with f_beta value: 0.4736945331096649.

In [0]:

train_model(256, 7, 5, mixup=True)

[INFO] Number of Classes:  61
[INFO] Training for :  densenet_256

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	2.703704	1.285418	0.629828	0.506337	05:32
1	2.622411	1.273359	0.631974	0.494505	05:30
2	2.474278	1.328985	0.607296	0.483533	05:31
3	2.390934	1.312649	0.619099	0.496389	05:32
4	2.265631	1.301950	0.610515	0.480573	05:33
5	2.341162	1.284232	0.624463	0.505368	05:35
6	2.306352	1.292962	0.621245	0.501745	05:36

Better model found at epoch 0 with f_beta value: 0.50633704662323.

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	2.633306	1.271392	0.637339	0.507305	06:12
1	2.680736	1.447017	0.596566	0.460401	06:13
2	2.451501	1.412368	0.596566	0.469816	06:13
3	2.242612	1.392771	0.609442	0.487551	06:13
4	2.171517	1.368796	0.619099	0.496713	06:12

Better model found at epoch 0 with f_beta value: 0.5073045492172241.

In [0]:

learn = create_cnn(get_data(300, (2048//300)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_256")
learn.model_dir = "models/"
learn.mixup()
size = 300
name = "densenet_" + str(size)
print("[INFO] Training for : ", name)

learn.fit_one_cycle(5, 1e-4, callbacks=[ShowGraph(learn),
                          SaveModelCallback(learn, monitor='f_beta', mode='max', name=name)])

[INFO] Number of Classes:  61
[INFO] Training for :  densenet_300

epoch	train_loss	valid_loss	accuracy	f_beta	time
0	2.749508	1.281459	0.644850	0.566936	06:56
1	2.606565	1.301558	0.634120	0.522477	06:56
2	2.626434	1.291356	0.637339	0.534306	06:55
3	2.604175	1.296236	0.650215	0.560165	07:01
4	2.425535	1.281673	0.648069	0.548248	07:00

Better model found at epoch 0 with f_beta value: 0.5669360160827637.

In [0]:

learn.load("densenet_300")
interp = ClassificationInterpretation.from_learner(learn)
losses, idxs = interp.top_losses()

display(interp.plot_top_losses(9, figsize=(15,11)))
display(interp.plot_confusion_matrix(figsize=(12,12), dpi=100))

None

None

In [0]:

print("[INFO] MOST CONFUSED:")
interp.most_confused(min_val=5)

[INFO] MOST CONFUSED:

Out[0]:

[('coffee-with-caffeine', 'espresso-with-caffeine', 15),
 ('salad-leaf-salad-green', 'mixed-salad-chopped-without-sauce', 11),
 ('bread-white', 'butter', 7),
 ('bread-sourdough', 'bread-wholemeal', 6),
 ('bread-white', 'bread-wholemeal', 6),
 ('salad-leaf-salad-green', 'leaf-spinach', 6),
 ('butter', 'bread-wholemeal', 5),
 ('coffee-with-caffeine', 'white-coffee-with-caffeine', 5),
 ('espresso-with-caffeine', 'coffee-with-caffeine', 5)]

The model is getting confused between some very common categories like coffee-with-caffeine and espresso-with-caffeine.

The model needs to be made more robust to this and hence appropriate augmentations can be used.

In [0]:

def make_submission(learn, name):
  images = []
  prediction = []
  probability = []
  test_path = "data/test/test_images/"
  test = pd.read_csv("test.csv")
  files = test.ImageId
  for i in files:
        images.append(i)
        img = open_image(os.path.join(test_path, i))
        pred_class, pred_idx, outputs = learn.predict(img)
        prediction.append(pred_class.obj)
        probability.append(outputs.abs().max().item())
  answer = pd.DataFrame({'ImageId': images, 'ClassName': prediction, 'probability': probability})
  display(answer.head())
  answer[["ImageId","ClassName"]].to_csv(name, index=False)

In [0]:

make_submission(learn, name="submission_size300.csv")

	ImageId	ClassName	probability
0	90e63a2fde.jpg	water	0.994021
1	a554d1ca8d.jpg	water-mineral	0.990370
2	48317e8ee8.jpg	water	0.856607
3	79528df667.jpg	hard-cheese	0.901751
4	6d2f2f63f5.jpg	bread-wholemeal	0.979332

Improving Further¶

Appropriate augmnentations
Different models like densenet201, resnet50
Mixed Precision training (i.e. to_fp16() in fastai)

Authors¶

🚀 Rohit Midha

👾 Shraddhaa Mohan