Overview
As AI-generated images become increasingly photorealistic, detecting synthetic content is a growing challenge in computer vision. This project explores how different neural network architectures — from a hand-built MLP to a convolutional network — perform on the same binary classification task: is this image real or AI-generated?
The project takes two approaches. First, an MLP implemented entirely from scratch in NumPy to understand the underlying mechanics. Second, a CNN in PyTorch to demonstrate real-world performance and compare against the baseline.
Approach 1 — MLP from Scratch (NumPy)
Built without any deep learning framework. Every component is implemented manually: parameter initialisation (He scaling), forward propagation, cost computation (Binary Cross-Entropy), backpropagation, and gradient descent with learning rate decay. Both full-batch and mini-batch variants were implemented.
2-Layer Network
Architecture: LINEAR → ReLU → LINEAR → Sigmoid
- Full-batch (7000 epochs): 75.44% test accuracy
- Mini-batch (100 epochs): 78.40% test accuracy
4-Layer Network
Architecture: [LINEAR → ReLU] × 3 → LINEAR → Sigmoid
- Full-batch (7000 epochs): 76.44% test accuracy
- Mini-batch (100 epochs): 81.94% test accuracy
Approach 2 — CNN (PyTorch)
Three progressively improved CNN versions, each addressing overfitting from the previous.
Baseline CNN
Train: 99.39% / Test: 95.08% / Gap: 4.31%
CNN + Dropout
Dropout(0.5) inserted after FC1's ReLU, randomly zeroing 50% of neurons per batch to reduce memorisation.
Train: 99.89% / Test: 95.91% / Gap: 3.98%
CNNv3 — Augmentation + BatchNorm + Dropout (Best)
Three regularisation techniques combined:
| Technique | Where Applied | Effect |
|---|---|---|
| Data Augmentation | Training loader | RandomHorizontalFlip + RandomCrop — prevents memorising fixed orientations |
| Batch Normalisation | After Conv1 and Conv2 | Normalises activations per batch, stabilises training |
| Dropout(0.5) | After FC1 ReLU | Forces distributed representations in FC layers |
Results Summary
| Model | Train | Test | Gap |
|---|---|---|---|
| MLP 2-layer full-batch (7000ep) | 75.75% | 75.44% | 0.31% |
| MLP 4-layer full-batch (7000ep) | 76.64% | 76.44% | 0.20% |
| MLP 2-layer mini-batch (100ep) | 80.56% | 78.40% | 2.16% |
| MLP 4-layer mini-batch (100ep) | 86.82% | 81.94% | 4.88% |
| CNN baseline (10ep) | 99.39% | 95.08% | 4.31% |
| CNN + Dropout (25ep) | 99.89% | 95.91% | 3.98% |
| CNNv3 + Aug + BN + Dropout (50ep) | 97.22% | 96.47% | 0.76% |
Dataset
CIFAKE: Real and AI-Generated Synthetic Images (Kaggle). 100,000 training images (50K real / 50K fake) and 20,000 test images at 32×32 RGB. Real images sourced from CIFAR-10; fake images generated using Stable Diffusion v1.4.