Multilayer perceptrons in gluon

Building a multilayer perceptron to classify MNIST images with gluon is not much harder than implementing softmax regression with ``gluon` <../chapter02_supervised-learning/softmax-regression-gluon.ipynb>`__, like we did in Chapter 2. In that chapter, our entire neural network consisted one one Dense layer (net = gluon.nn.Dense(num_outputs)).

In this chapter, we’re going to show you how to compose multiple layers together into a neural network. There are two main ways to do this in Gluon and we’ll walk through both. The first is to define a custom Block. In Gluon, everything is a Block! Layers, losses, whole networks, they’re all blocks! So naturally, that’s an flexible way to do nearly anything you want.

We’ll also make use of gluon.nn.Sequential. Sequential gives us a special way of rapidly building networks when follow a common design pattern: they look like a stack of pancakes. Many networks follow this pattern: a bunch of layers, one stacked on top of another, where the output of each layer is the input to the next layer. Sequential just takes a list of layers (we pass them in by calling net.add(<Layer goes here!>). The following unnecessary picture should give you an intuitive sense of when to (and not to) use sequential.

Imports

First we’ll import the necessary bits.

In [52]:
from __future__ import print_function
import numpy as np
import mxnet as mx
from mxnet import nd, autograd, gluon

We’ll also want to set the contexts for our data and our models.

In [53]:
data_ctx = mx.cpu()
model_ctx = mx.cpu()
# model_ctx = mx.gpu(0)

The MNIST dataset

In [54]:
batch_size = 64
num_inputs = 784
num_outputs = 10
num_examples = 60000
def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),
                                      batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                                     batch_size, shuffle=False)

Define the model with gluon.Block

Now instead of having one gluon.nn.Dense layer, we’ll want to compose several together. First let’s go through the most fundamental way of doing this. Then we’ll introduce some shortcuts. In gluon a Block has one main job - define a forward method that takes some NDArray input x and generates an NDArray output. Because the output and input are related to each other via NDArray operations, MXNet can take derivatives through the block automatically. A Block can just do something simple like apply an activation function. But it can also combine a bunch of other Blocks together in creative ways I this case, we’ll jsut want to instantiate three Dense layers. The forward can then invoke the layers in turn to generate its output.

In [55]:
class MLP(gluon.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        with self.name_scope():
            self.dense0 = gluon.nn.Dense(64)
            self.dense1 = gluon.nn.Dense(64)
            self.dense2 = gluon.nn.Dense(10)

    def forward(self, x):
        x = nd.relu(self.dense0(x))
        x = nd.relu(self.dense1(x))
        x = self.dense2(x)
        return x

We can now instantiate a multilayer perceptron using our MLP class. And just as with any other block, we can grab its paraeters with collect_params and initialize them.

In [56]:
net = MLP()
net.collect_params().initialize(mx.init.Normal(sigma=.01), ctx=model_ctx)

And we can synthesize some gibberish data just to demonstrate one forward pass through the network.

In [57]:
data = nd.ones((1,784))
net(data.as_in_context(model_ctx))
Out[57]:

[[  4.40923759e-05  -8.20533780e-04   9.26479988e-04   8.04695825e-04
   -7.55993300e-04  -6.38230820e-04   5.50494005e-05  -1.17325678e-03
    7.58020557e-04   2.63349182e-04]]
<NDArray 1x10 @gpu(0)>

Because we’re working with an imperative framework and not a symbolic framework, debugging Gluon Blocks is easy. If we want to see what’s going on at each layer of the neural network, we can just plug in a bunch of Python print statements.

In [58]:
class MLP(gluon.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        with self.name_scope():
            self.dense0 = gluon.nn.Dense(64, activation="relu")
            self.dense1 = gluon.nn.Dense(64, activation="relu")
            self.dense2 = gluon.nn.Dense(10)

    def forward(self, x):
        x = self.dense0(x)
        print("Hidden Representation 1: %s" % x)
        x = self.dense1(x)
        print("Hidden Representation 2: %s" % x)
        x = self.dense2(x)
        print("Network output: %s" % x)
        return x

net = MLP()
net.collect_params().initialize(mx.init.Normal(sigma=.01), ctx=model_ctx)
net(data.as_in_context(model_ctx))
Hidden Representation 1:
[[ 0.          0.21691252  0.          0.33119828  0.          0.          0.
   0.21983771  0.          0.          0.4556309   0.          0.08249515
   0.31085208  0.04958198  0.          0.330221    0.          0.          0.
   0.13425761  0.37306851  0.04791637  0.          0.          0.          0.
   0.23431879  0.          0.          0.          0.0448049   0.14588076
   0.          0.0239118   0.          0.25473717  0.03351231  0.20005098
   0.          0.          0.00603895  0.10416938  0.10464748  0.23973437
   0.          0.33381382  0.          0.24913697  0.29079285  0.12793788
   0.29657096  0.07166591  0.          0.43335861  0.32743987  0.          0.
   0.          0.          0.04985283  0.10861691  0.          0.        ]]
<NDArray 1x64 @gpu(0)>
Hidden Representation 2:
[[ 0.          0.          0.01573334  0.          0.          0.02613701
   0.00248956  0.          0.          0.02152583  0.          0.
   0.01183741  0.00089611  0.00513365  0.00952989  0.          0.          0.
   0.00989626  0.          0.00950431  0.          0.          0.
   0.01269766  0.00485498  0.          0.          0.00033371  0.00123863
   0.02299101  0.          0.01520418  0.          0.00365212  0.00016546
   0.00049757  0.00220794  0.          0.01853371  0.02050827  0.00796316
   0.02365419  0.          0.          0.          0.          0.00056281
   0.          0.0158518   0.00588764  0.02745012  0.02089521  0.02061545
   0.01254779  0.00096457  0.          0.00426208  0.          0.          0.
   0.00827779  0.00288925]]
<NDArray 1x64 @gpu(0)>
Network output:
[[  8.51602003e-04   4.21012577e-04  -3.94555100e-05   4.91072249e-04
   -2.73533806e-05  -9.80906654e-04  -2.85841583e-04  -1.03790930e-03
   -5.04873577e-04   7.01223849e-04]]
<NDArray 1x10 @gpu(0)>
Out[58]:

[[  8.51602003e-04   4.21012577e-04  -3.94555100e-05   4.91072249e-04
   -2.73533806e-05  -9.80906654e-04  -2.85841583e-04  -1.03790930e-03
   -5.04873577e-04   7.01223849e-04]]
<NDArray 1x10 @gpu(0)>

Faster modeling with gluon.nn.Sequential

MLPs, like many deep neural networks follow a pretty boring architecture. Just take a list of the layers, chain them together, and return the output. There’s no reason why we have to actually define a new class every time we want to do this. Gluon’s Sequential class provides a nice way of rapidly implementing this standard network architecture. We just

  • Instantiate a Sequential (let’s call it net)
  • Add a bunch of layers to it using net.add(...)

Sequential assumes that the layers arrive bottom to top (with input at the very bottom). We could implement the same architecture as shown above using sequential in just 6 lines.

In [59]:
num_hidden = 64
net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Dense(num_hidden, activation="relu"))
    net.add(gluon.nn.Dense(num_hidden, activation="relu"))
    net.add(gluon.nn.Dense(num_outputs))

Parameter initialization

In [60]:
net.collect_params().initialize(mx.init.Normal(sigma=.1), ctx=model_ctx)

Softmax cross-entropy loss

In [61]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

Optimizer

In [62]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .01})

Evaluation metric

In [63]:
def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(model_ctx).reshape((-1, 784))
        label = label.as_in_context(model_ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]

Training loop

In [64]:
epochs = 10
smoothing_constant = .01

for e in range(epochs):
    cumulative_loss = 0
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(model_ctx).reshape((-1, 784))
        label = label.as_in_context(model_ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
            loss.backward()
        trainer.step(data.shape[0])
        cumulative_loss += nd.sum(loss).asscalar()


    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" %
          (e, cumulative_loss/num_examples, train_accuracy, test_accuracy))
Epoch 0. Loss: 1.27231270386, Train_acc 0.836933333333, Test_acc 0.846
Epoch 1. Loss: 0.477833755287, Train_acc 0.881066666667, Test_acc 0.8889
Epoch 2. Loss: 0.381976018492, Train_acc 0.89735, Test_acc 0.9035
Epoch 3. Loss: 0.33866001844, Train_acc 0.907533333333, Test_acc 0.9125
Epoch 4. Loss: 0.309403327727, Train_acc 0.913033333333, Test_acc 0.9165
Epoch 5. Loss: 0.285777178836, Train_acc 0.92025, Test_acc 0.9219
Epoch 6. Loss: 0.266318054875, Train_acc 0.925, Test_acc 0.9281
Epoch 7. Loss: 0.249801190837, Train_acc 0.931183333333, Test_acc 0.9323
Epoch 8. Loss: 0.235263404306, Train_acc 0.935483333333, Test_acc 0.9357
Epoch 9. Loss: 0.222571320128, Train_acc 0.9379, Test_acc 0.936

Conclusion

In this chapter, we showed two ways to build multilayer perceptrons in with Gluon. We demonstrated how to subclass gluon.Block, and define your own forward passes. We also showed how you might debug your network by lacing your forward pass with print statements. Finally, we showed how you could define and instantiate an euivalent network with just 6 lines of code by using gluon.nn.Sequential. Now that you understand the basics, you’re ready to leap ahead. If you’re following the book in order, then the next stop will be dropout regularization. Other possible choices would be to start leanring about convolutional neural networks which are especialy handy for working with images, or recurrent neural networks, which are especially useful for natural language processing.