Image Classification Using FastAI

Image classification is the task of assigning a class label to an input image. For example, an image classification model may be trained to determine whether a given image contains a dog or a cat. The model learns to classify images during the training process, where it is shown many labeled example images of dogs and cats.

Some common uses of image classification include:

  • Photo organization and search - Identify and categorize images by their content
  • Face recognition - Determine the identity of a face in an image
  • Medical imaging - Detect abnormalities in medical images like X-rays or MRI scans
  • Self-driving vehicles - Recognize traffic signs, pedestrians, etc.
  • Satellite/aerial image analysis - Identify land use patterns like buildings, forest, water, etc.

Image classification involves two main steps:

  1. Feature extraction: The input image is preprocessed and salient features are extracted. This may involve techniques like edge detection, color analysis, shape analysis, texture analysis, etc. The goal is to reduce the raw pixel data into a more manageable representation.
  2. Classification: The extracted features are then used as input to a machine learning model like a neural network to classify the image. The model has been trained on many labeled example images to recognize patterns in features that correspond to different object classes.

Many image classification models today leverage deep neural networks and in particular convolutional neural networks (CNNs). CNNs contain many stacked layers that each learn to detect different features and patterns in an incremental way, building up a highly descriptive and discriminative representation of the input image.

The fastAI Library

fastAI is a popular Python library for deep learning that provides a high-level API to quickly build and train neural network models. Under the hood, it leverages PyTorch for the actual model implementations. Some key advantages of fastAI include:

  • Provides sane defaults and best practices for training neural nets
  • Makes it easy to do transfer learning from pretrained models
  • Includes helpers for vision, text, tabular data, etc
  • Focuses on rapid iteration and fast modeling

The vision module of fastAI contains tools tailored for image classification, including data loading and augmentation functions. It also provides pretrained models that we can leverage for transfer learning.

Transfer learning is where we take an existing model that has been pretrained on a large dataset like ImageNet, and reuse it for our own task. This is faster and easier than training a model from scratch. The pretrained model already has learned good feature representations that are useful for many vision tasks. We then fine-tune the model on our target dataset to adapt those features to our specific classes.

Installation and Imports

If you don't have FastAI library installed, you will have to install the library.

We first create a new Conda environment and then install fastai library.

conda create -n fastai python=3.8
conda install -c fastai fastai

Binary Image Classification Project Overview

To demonstrate how to do image classification with fastAI, we will build a model to classify images as either dogs or cats. This is a binary classification task since there are two possible classes.

The dataset we will use is the Dogs vs Cats Kaggle dataset from Kaggle's past competitions. It contains 25,000 labeled images of dogs and cats.

Our project workflow will look like:

  1. Install fastAI and other dependencies
  2. Load and split the dataset
  3. Preprocess and augment the data
  4. Fine-tune a pretrained model
  5. Evaluate the model
  6. Use the model for predictions

Next let's go through each of these steps to build the image classifier.

!pip install opencv-python
!pip install matplotlib

Now we import the packages we need:

from fastai.vision.all import *
from path import Path
import cv2
import matplotlib.pyplot as plt

We set up a path to the dataset directory

PATH = Path('../input/dogs-vs-cats')

# print files in the dataset
PATH.ls()

Load and Split Data

Next we load and split the training data into training and validation sets.

We initialize the data blocks which handle loading and transforming batches of images:

data = ImageDataLoaders.from_name_func(
    PATH,
    get_image_files(PATH/'train'),
    valid_pct=0.2,
    seed=42,
    bs=32
)

This loads the image files from train subfolder, does an 80/20 split into training and validation sets and creates data loaders with batch size of 32

batch = data.valid.one_batch()

rows, cols = 4, 8
fig, ax = plt.subplots(rows, cols, figsize=(cols*2, rows*2))

for i in range(rows*cols):
    img,label = batch[0][i],batch[1][i] 
    ax[i//cols,i%cols].imshow(img.permute(1,2,0))
    ax[i//cols,i%cols].set_title(data.valid.y.items[label])
    ax[i//cols,i%cols].axis('off')
    
plt.tight_layout()    
plt.show()

Preprocess and Augment Data

Before feeding data into the model, we apply some preprocessing and augmentation transforms. This helps improve model accuracy and generalizability.

First we define some common transforms:

imagenet_stats = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

train_tfms = [
    RandomResizedCrop(224, min_scale=0.5),
    hvflip_batch(),
    aug_transforms(size=224, min_scale=0.75),
    Normalize.from_stats(*imagenet_stats)
]

val_tfms = [
    Resize(224),
    CenterCrop(224),
    Normalize.from_stats(*imagenet_stats)
]

# we apply the transforms to the data loaders:
data = data.transform(train_tfms, val_tfms)

#  when we access a batch, the images will be preprocessed:
batch = data.valid.one_batch()
print(batch[0].shape)

Fine-Tune ResNet34 Transfer Model

For the model architecture, we leverage a pretrained ResNet34 model from fastAI. ResNet34 is a convolutional neural network that placed highly in the ImageNet competition.

First we define the model:

base_model = models.resnet34
learn = Learner(data, base_model, metrics=error_rate)
learn.model

This shows the ResNet34 architecture, with a final classification layer added for our 2 image classes. Under the hood, fastAI will automatically download and load the pretrained ImageNet weights into the model.

Now we fine-tune the model on our dataset:

learn.fine_tune(4)

This trains for 4 epochs. We pass through the data in small batches and update the model weights to minimize the loss. The pretrained layers are frozen so that only the classification layer is updated.

After fine-tuning, we should achieve high accuracy on both the training and validation sets. We can evaluate the model:

print(learn.validate())

This prints the loss and accuracy metrics. We should see >95% accuracy on this binary classification task.

Make Predictions

We can now use the trained model to classify new images.

First we define a predict function that preprocesses a single image:

def predict_image(img_file):
    img = PILImage.create(img_file)
    img = val_tfms(img)
    pred,pred_idx,probs = learn.predict(img)
    return img, pred, probs

Then we pass a dog image to predict:

img, pred, probs = predict_image(PATH/'test'/'dog.jpg')

print(f'Prediction: {pred}')
print(f'Probabilities: {probs}')

plt.imshow(img.permute(1,2,0))
plt.title(f'Predicted: {pred}, Probability: {probs[pred_idx]:.2f}');

This should successfully predict "dog" with a high probability.

We can also embed model predictions in a simple GUI app using fastAI's ImageClassifier widget:

learn_inf = load_learner(PATH/'export.pkl')

ImageClassifier(learn_inf)

This allows interactively classifying images via a webcam or uploads.

Conclusion

In this blog post, we walked through an end-to-end workflow for binary image classification using fastAI. We loaded the Dogs vs Cats dataset, preprocessed and augmented the images, fine-tuned a pretrained ResNet34 model, and made predictions on new images.

The fastAI library provided a high-level API for each step, allowing us to quickly build an accurate model without needing to handle the implementation details. With just a few lines of code, we leveraged transfer learning to effectively classify images of dogs and cats.

This demonstrates how deep learning can enable powerful computer vision applications. Image classification forms the basis for many real-world systems today across industries like social media, medicine, autonomous vehicles, and more. With libraries like fastAI that promote rapid iteration, it is easier than ever to prototype and experiment with image recognition in Python.