Deep Convolutional Generative Adversarial Networks

In our introduction to generative adversarial networks (GANs), we introduced the basic ideas behind how GANs work. We showed that they can draw samples from some simple, easy-to-sample distribution, like a uniform or normal distribution, and transform them into samples that appear to match the distribution of some data set. And while our example of matching a 2D Gaussian distribution got the point across, it’s not especially exciting.

In this notebook, we’ll demonstrate how you can use GANs to generate photorealistic images. We’ll be basing our models on the deep convolutional GANs introduced in this paper. We’ll borrow the convolutional architecture that have proven so successful for discriminative computer vision problems and show how via GANs, they can be leveraged to generate photorealistic images.

In this tutorial, concentrate on the LWF Face Dataset, which contains roughly 13000 images of faces. By the end of the tutorial, you’ll know how to generate photo-realistic images of your own, given any dataset of images. First, we’ll the the preliminaries out of the way.

In [1]:
from __future__ import print_function
import os
import matplotlib as mpl
import tarfile
import matplotlib.image as mpimg
from matplotlib import pyplot as plt

import mxnet as mx
from mxnet import gluon
from mxnet import ndarray as nd
from mxnet.gluon import nn, utils
from mxnet import autograd
import numpy as np

Set training parameters

In [2]:
epochs = 2 # Set low by default for tests, set higher when you actually run this code.
batch_size = 64
latent_z_size = 100

use_gpu = True
ctx = mx.gpu() if use_gpu else mx.cpu()

lr = 0.0002
beta1 = 0.5

Download and preprocess the LWF Face Dataset

In [3]:
lfw_url = 'http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz'
data_path = 'lfw_dataset'
if not os.path.exists(data_path):
    os.makedirs(data_path)
    data_file = utils.download(lfw_url)
    with tarfile.open(data_file) as tar:
        tar.extractall(path=data_path)
Downloading lfw-deepfunneled.tgz from http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz...

First, we resize images to size \(64\times64\). Then, we normalize all pixel values to the \([-1, 1]\) range.

In [4]:
target_wd = 64
target_ht = 64
img_list = []

def transform(data, target_wd, target_ht):
    # resize to target_wd * target_ht
    data = mx.image.imresize(data, target_wd, target_ht)
    # transpose from (target_wd, target_ht, 3)
    # to (3, target_wd, target_ht)
    data = nd.transpose(data, (2,0,1))
    # normalize to [-1, 1]
    data = data.astype(np.float32)/127.5 - 1
    # if image is greyscale, repeat 3 times to get RGB image.
    if data.shape[0] == 1:
        data = nd.tile(data, (3, 1, 1))
    return data.reshape((1,) + data.shape)

for path, _, fnames in os.walk(data_path):
    for fname in fnames:
        if not fname.endswith('.jpg'):
            continue
        img = os.path.join(path, fname)
        img_arr = mx.image.imread(img)
        img_arr = transform(img_arr, target_wd, target_ht)
        img_list.append(img_arr)
train_data = mx.io.NDArrayIter(data=nd.concatenate(img_list), batch_size=batch_size)

Visualize 4 images:

In [5]:
def visualize(img_arr):
    plt.imshow(((img_arr.asnumpy().transpose(1, 2, 0) + 1.0) * 127.5).astype(np.uint8))
    plt.axis('off')

for i in range(4):
    plt.subplot(1,4,i+1)
    visualize(img_list[i + 10][0])
plt.show()
../_images/chapter14_generative-adversarial-networks_dcgan_9_0.png

Defining the networks

The core to the DCGAN architecture uses a standard CNN architecture on the discriminative model. For the generator, convolutions are replaced with upconvolutions, so the representation at each layer of the generator is actually successively larger, as it mapes from a low-dimensional latent vector onto a high-dimensional image.

  • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
  • Use batch normalization in both the generator and the discriminator.
  • Remove fully connected hidden layers for deeper architectures.
  • Use ReLU activation in generator for all layers except for the output, which uses Tanh.
  • Use LeakyReLU activation in the discriminator for all layers.
DCGAN Architecture
In [6]:
# build the generator
nc = 3
ngf = 64
netG = nn.Sequential()
with netG.name_scope():
    # input is Z, going into a convolution
    netG.add(nn.Conv2DTranspose(ngf * 8, 4, 1, 0, use_bias=False))
    netG.add(nn.BatchNorm())
    netG.add(nn.Activation('relu'))
    # state size. (ngf*8) x 4 x 4
    netG.add(nn.Conv2DTranspose(ngf * 4, 4, 2, 1, use_bias=False))
    netG.add(nn.BatchNorm())
    netG.add(nn.Activation('relu'))
    # state size. (ngf*8) x 8 x 8
    netG.add(nn.Conv2DTranspose(ngf * 2, 4, 2, 1, use_bias=False))
    netG.add(nn.BatchNorm())
    netG.add(nn.Activation('relu'))
    # state size. (ngf*8) x 16 x 16
    netG.add(nn.Conv2DTranspose(ngf, 4, 2, 1, use_bias=False))
    netG.add(nn.BatchNorm())
    netG.add(nn.Activation('relu'))
    # state size. (ngf*8) x 32 x 32
    netG.add(nn.Conv2DTranspose(nc, 4, 2, 1, use_bias=False))
    netG.add(nn.Activation('tanh'))
    # state size. (nc) x 64 x 64

# build the discriminator
ndf = 64
netD = nn.Sequential()
with netD.name_scope():
    # input is (nc) x 64 x 64
    netD.add(nn.Conv2D(ndf, 4, 2, 1, use_bias=False))
    netD.add(nn.LeakyReLU(0.2))
    # state size. (ndf) x 32 x 32
    netD.add(nn.Conv2D(ndf * 2, 4, 2, 1, use_bias=False))
    netD.add(nn.BatchNorm())
    netD.add(nn.LeakyReLU(0.2))
    # state size. (ndf) x 16 x 16
    netD.add(nn.Conv2D(ndf * 4, 4, 2, 1, use_bias=False))
    netD.add(nn.BatchNorm())
    netD.add(nn.LeakyReLU(0.2))
    # state size. (ndf) x 8 x 8
    netD.add(nn.Conv2D(ndf * 8, 4, 2, 1, use_bias=False))
    netD.add(nn.BatchNorm())
    netD.add(nn.LeakyReLU(0.2))
    # state size. (ndf) x 4 x 4
    netD.add(nn.Conv2D(1, 4, 1, 0, use_bias=False))

Setup Loss Function and Optimizer

We use binary cross-entropy as our loss function and use the Adam optimizer. We initialize the network’s parameters by sampling from a normal distribution.

In [7]:
# loss
loss = gluon.loss.SigmoidBinaryCrossEntropyLoss()

# initialize the generator and the discriminator
netG.initialize(mx.init.Normal(0.02), ctx=ctx)
netD.initialize(mx.init.Normal(0.02), ctx=ctx)

# trainer for the generator and the discriminator
trainerG = gluon.Trainer(netG.collect_params(), 'adam', {'learning_rate': lr, 'beta1': beta1})
trainerD = gluon.Trainer(netD.collect_params(), 'adam', {'learning_rate': lr, 'beta1': beta1})

Training Loop

We recommend thst you use a GPU for training this model. After a few epochs, we can see human-face-like images are generated.

In [8]:
from datetime import datetime
import time
import logging

real_label = nd.ones((batch_size,), ctx=ctx)
fake_label = nd.zeros((batch_size,),ctx=ctx)

def facc(label, pred):
    pred = pred.ravel()
    label = label.ravel()
    return ((pred > 0.5) == label).mean()
metric = mx.metric.CustomMetric(facc)

stamp =  datetime.now().strftime('%Y_%m_%d-%H_%M')
logging.basicConfig(level=logging.DEBUG)

for epoch in range(epochs):
    tic = time.time()
    btic = time.time()
    train_data.reset()
    iter = 0
    for batch in train_data:
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        data = batch.data[0].as_in_context(ctx)
        latent_z = mx.nd.random_normal(0, 1, shape=(batch_size, latent_z_size, 1, 1), ctx=ctx)

        with autograd.record():
            # train with real image
            output = netD(data).reshape((-1, 1))
            errD_real = loss(output, real_label)
            metric.update([real_label,], [output,])

            # train with fake image
            fake = netG(latent_z)
            output = netD(fake).reshape((-1, 1))
            errD_fake = loss(output, fake_label)
            errD = errD_real + errD_fake
            errD.backward()
            metric.update([fake_label,], [output,])

        trainerD.step(batch.data[0].shape[0])

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        with autograd.record():
            fake = netG(latent_z)
            output = netD(fake).reshape((-1, 1))
            errG = loss(output, real_label)
            errG.backward()

        trainerG.step(batch.data[0].shape[0])

        # Print log infomation every ten batches
        if iter % 10 == 0:
            name, acc = metric.get()
            logging.info('speed: {} samples/s'.format(batch_size / (time.time() - btic)))
            logging.info('discriminator loss = %f, generator loss = %f, binary training acc = %f at iter %d epoch %d'
                     %(nd.mean(errD).asscalar(),
                       nd.mean(errG).asscalar(), acc, iter, epoch))
        iter = iter + 1
        btic = time.time()

    name, acc = metric.get()
    metric.reset()
    # logging.info('\nbinary training acc at epoch %d: %s=%f' % (epoch, name, acc))
    # logging.info('time: %f' % (time.time() - tic))

    # Visualize one generated image for each epoch
    # fake_img = fake[0]
    # visualize(fake_img)
    # plt.show()
INFO:root:speed: 7.52706169454569 samples/s
INFO:root:discriminator loss = 1.267250, generator loss = 3.865829, binary training acc = 0.593750 at iter 0 epoch 0
INFO:root:speed: 571.6135857413599 samples/s
INFO:root:discriminator loss = 0.157171, generator loss = 7.747376, binary training acc = 0.889205 at iter 10 epoch 0
INFO:root:speed: 571.3860582336627 samples/s
INFO:root:discriminator loss = 0.096981, generator loss = 16.442284, binary training acc = 0.921503 at iter 20 epoch 0
INFO:root:speed: 576.1666283894149 samples/s
INFO:root:discriminator loss = 0.620363, generator loss = 20.799423, binary training acc = 0.929940 at iter 30 epoch 0
INFO:root:speed: 573.9958132065749 samples/s
INFO:root:discriminator loss = 0.315033, generator loss = 23.815233, binary training acc = 0.929306 at iter 40 epoch 0
INFO:root:speed: 586.6403019356136 samples/s
INFO:root:discriminator loss = 0.695109, generator loss = 28.778854, binary training acc = 0.922947 at iter 50 epoch 0
INFO:root:speed: 569.9520700498324 samples/s
INFO:root:discriminator loss = 0.833436, generator loss = 21.695778, binary training acc = 0.924180 at iter 60 epoch 0
INFO:root:speed: 569.7440013923349 samples/s
INFO:root:discriminator loss = 2.281753, generator loss = 18.117245, binary training acc = 0.917694 at iter 70 epoch 0
INFO:root:speed: 578.2505191504029 samples/s
INFO:root:discriminator loss = 0.599319, generator loss = 12.404789, binary training acc = 0.911651 at iter 80 epoch 0
INFO:root:speed: 579.2137542939014 samples/s
INFO:root:discriminator loss = 0.544385, generator loss = 12.193027, binary training acc = 0.898695 at iter 90 epoch 0
INFO:root:speed: 571.0299219298432 samples/s
INFO:root:discriminator loss = 2.558564, generator loss = 15.532148, binary training acc = 0.890393 at iter 100 epoch 0
INFO:root:speed: 569.3210746106583 samples/s
INFO:root:discriminator loss = 1.515995, generator loss = 5.084762, binary training acc = 0.880068 at iter 110 epoch 0
INFO:root:speed: 579.3225066416898 samples/s
INFO:root:discriminator loss = 0.753073, generator loss = 6.322435, binary training acc = 0.876872 at iter 120 epoch 0
INFO:root:speed: 569.6146399636717 samples/s
INFO:root:discriminator loss = 0.472531, generator loss = 3.662419, binary training acc = 0.871899 at iter 130 epoch 0
INFO:root:speed: 582.8483187785251 samples/s
INFO:root:discriminator loss = 0.569863, generator loss = 5.218751, binary training acc = 0.870567 at iter 140 epoch 0
INFO:root:speed: 575.3528092970221 samples/s
INFO:root:discriminator loss = 0.770430, generator loss = 12.046432, binary training acc = 0.870137 at iter 150 epoch 0
INFO:root:speed: 566.8923997034979 samples/s
INFO:root:discriminator loss = 0.602391, generator loss = 6.901675, binary training acc = 0.869177 at iter 160 epoch 0
INFO:root:speed: 575.4502474923844 samples/s
INFO:root:discriminator loss = 0.996327, generator loss = 8.763069, binary training acc = 0.864446 at iter 170 epoch 0
INFO:root:speed: 566.8349406316714 samples/s
INFO:root:discriminator loss = 0.138103, generator loss = 4.784034, binary training acc = 0.863130 at iter 180 epoch 0
INFO:root:speed: 569.1122597387179 samples/s
INFO:root:discriminator loss = 0.353630, generator loss = 4.181261, binary training acc = 0.862075 at iter 190 epoch 0
INFO:root:speed: 583.3220103263043 samples/s
INFO:root:discriminator loss = 0.551197, generator loss = 5.818136, binary training acc = 0.861590 at iter 200 epoch 0
INFO:root:speed: 572.8688444473847 samples/s
INFO:root:discriminator loss = 0.750586, generator loss = 5.310256, binary training acc = 0.859375 at iter 0 epoch 1
INFO:root:speed: 572.0825957696201 samples/s
INFO:root:discriminator loss = 0.408226, generator loss = 6.662098, binary training acc = 0.901989 at iter 10 epoch 1
INFO:root:speed: 567.7917131126909 samples/s
INFO:root:discriminator loss = 0.192051, generator loss = 5.185369, binary training acc = 0.883929 at iter 20 epoch 1
INFO:root:speed: 571.0347808698834 samples/s
INFO:root:discriminator loss = 0.686932, generator loss = 7.547240, binary training acc = 0.881300 at iter 30 epoch 1
INFO:root:speed: 433.1853898314931 samples/s
INFO:root:discriminator loss = 0.282612, generator loss = 3.748997, binary training acc = 0.875381 at iter 40 epoch 1
INFO:root:speed: 567.3908667209182 samples/s
INFO:root:discriminator loss = 0.516654, generator loss = 7.254321, binary training acc = 0.870558 at iter 50 epoch 1
INFO:root:speed: 572.405868304333 samples/s
INFO:root:discriminator loss = 0.719410, generator loss = 4.754397, binary training acc = 0.874103 at iter 60 epoch 1
INFO:root:speed: 570.3977510087949 samples/s
INFO:root:discriminator loss = 0.972063, generator loss = 3.013078, binary training acc = 0.875110 at iter 70 epoch 1
INFO:root:speed: 577.151467198734 samples/s
INFO:root:discriminator loss = 0.492403, generator loss = 4.267198, binary training acc = 0.874325 at iter 80 epoch 1
INFO:root:speed: 584.5032509243251 samples/s
INFO:root:discriminator loss = 1.047324, generator loss = 8.863756, binary training acc = 0.868561 at iter 90 epoch 1
INFO:root:speed: 571.3848419959216 samples/s
INFO:root:discriminator loss = 0.435001, generator loss = 4.618430, binary training acc = 0.871210 at iter 100 epoch 1
INFO:root:speed: 572.3070179365983 samples/s
INFO:root:discriminator loss = 0.552533, generator loss = 5.671945, binary training acc = 0.873170 at iter 110 epoch 1
INFO:root:speed: 565.6798556906115 samples/s
INFO:root:discriminator loss = 0.202356, generator loss = 3.845943, binary training acc = 0.873063 at iter 120 epoch 1
INFO:root:speed: 556.1665544397321 samples/s
INFO:root:discriminator loss = 0.313401, generator loss = 3.627820, binary training acc = 0.876312 at iter 130 epoch 1
INFO:root:speed: 571.695150390912 samples/s
INFO:root:discriminator loss = 0.717447, generator loss = 3.168487, binary training acc = 0.875609 at iter 140 epoch 1
INFO:root:speed: 560.0631260497189 samples/s
INFO:root:discriminator loss = 0.344885, generator loss = 5.198402, binary training acc = 0.878104 at iter 150 epoch 1
INFO:root:speed: 576.5651138264992 samples/s
INFO:root:discriminator loss = 0.560294, generator loss = 5.221199, binary training acc = 0.878494 at iter 160 epoch 1
INFO:root:speed: 569.9484396431286 samples/s
INFO:root:discriminator loss = 0.562455, generator loss = 7.966813, binary training acc = 0.882173 at iter 170 epoch 1
INFO:root:speed: 571.6099341372577 samples/s
INFO:root:discriminator loss = 0.398122, generator loss = 7.471758, binary training acc = 0.883287 at iter 180 epoch 1
INFO:root:speed: 575.2073301391115 samples/s
INFO:root:discriminator loss = 1.594285, generator loss = 13.767216, binary training acc = 0.881749 at iter 190 epoch 1
INFO:root:speed: 565.8670725306137 samples/s
INFO:root:discriminator loss = 1.298736, generator loss = 5.583053, binary training acc = 0.875428 at iter 200 epoch 1

Results

Given a trained generator, we can generate some images of faces.

In [9]:
num_image = 8
for i in range(num_image):
    latent_z = mx.nd.random_normal(0, 1, shape=(1, latent_z_size, 1, 1), ctx=ctx)
    img = netG(latent_z)
    plt.subplot(2,4,i+1)
    visualize(img[0])
plt.show()
../_images/chapter14_generative-adversarial-networks_dcgan_17_0.png

We can also interpolate along the manifold between images by interpolating linearly between points in the latent space and visualizing the corresponding images. We can see that small changes in the latent space results in smooth changes in generated images.

In [10]:
num_image = 12
latent_z = mx.nd.random_normal(0, 1, shape=(1, latent_z_size, 1, 1), ctx=ctx)
step = 0.05
for i in range(num_image):
    img = netG(latent_z)
    plt.subplot(3,4,i+1)
    visualize(img[0])
    latent_z += 0.05
plt.show()
../_images/chapter14_generative-adversarial-networks_dcgan_19_0.png

For whinges or inquiries, open an issue on GitHub.