Understanding LeNet: History and Implementation with PyTorch

Introduction

LeNet-5, developed by Yann LeCun and his colleagues in the late 1980s, is considered one of the pioneering architectures in Convolutional Neural Networks (CNNs). It was designed primarily for handwritten digit recognition, specifically with the MNIST dataset, which consists of grayscale images of handwritten digits. The introduction of this architecture marked a significant milestone in the field of image processing and computer vision, showcasing the power of deep learning in pattern recognition tasks (LeCun et al., 1998).

History of LeNet

Historically, the development of LeNet stemmed from the need for automated systems to detect and recognize handwritten numbers, which posed a significant challenge due to variability in handwriting styles. LeNet-5 consists of seven layers, excluding the input layer, and integrates various operations such as convolutions, pooling, and fully connected layers, which was innovative at that time. The architecture processes input images through feature extraction and classification stages, greatly influencing subsequent neural network designs.

Architecture Overview

The LeNet-5 architecture consists of the following layers: 1. Input Layer: Accepts 32x32 pixel grayscale images. 2. Convolutional Layer 1: Applies six filters of size 5x5, outputting feature maps of size 28x28. 3. Subsampling Layer 1: Utilizes average pooling (subsampling) to downsample the feature maps to 14x14. 4. Convolutional Layer 2: Applies sixteen 5x5 filters, resulting in feature maps sized 10x10. 5. Subsampling Layer 2: Another average pooling layer reduces the size to 5x5. 6. Fully Connected Layer 1: Connects all 5x5×16 feature maps into a single layer of 120 neurons, introducing non-linearity with an activation function. 7. Fully Connected Layer 2: Contains 84 neurons. 8. Output Layer: Presents the final classification results for the digits.

Implementing LeNet in PyTorch

Now that we have an understanding of the architecture, let’s implement LeNet-5 using PyTorch.

Step-by-Step Implementation

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader

# Define the LeNet-5 architecture
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)  # Flatten
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize the model, define a loss function and an optimizer
lenet = LeNet5()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(lenet.parameters(), lr=0.001)

# Load the MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

# Training loop
def train_model(model, criterion, optimizer, train_loader, num_epochs=5):
    model.train()
    for epoch in range(num_epochs):
        total_loss = 0
        for images, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(train_loader):.4f}')

train_model(lenet, criterion, optimizer, train_loader)

Visualizing the Architecture

The LeNet architecture consists of layers that enable feature extraction and classification. Each convolutional layer is followed by activation and pooling layers, which help in learning spatial hierarchies in the input data.

Summary

In conclusion, LeNet-5 was instrumental in the development of convolutional neural networks, laying the foundation for modern deep learning architectures in image processing. Its implementation in PyTorch demonstrates the ease of utilizing this architecture for practical applications, reinforcing the importance of understanding historical models in the evolution of machine learning.

References

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Chollet, F. (2017). Deep Learning with Python. Manning Publications.