Multiclass logistic regression with gluon

Now that we’ve built a logistic regression model from scratch, let’s make this more efficient with gluon. If you completed the corresponding chapters on linear regression, you might be tempted rest your eyes a little in this one. We’ll be using gluon in a rather similar way and since the interface is reasonably well designed, you won’t have to do much work. To keep you awake we’ll introduce a few subtle tricks.

Let’s start by importing the standard packages.

In [1]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np

Set the context

Now, let’s set the context. In the linear regression tutorial we did all of our computation on the cpu (mx.cpu()) just to keep things simple. When you’ve got 2-dimensional data and scalar labels, a smartwatch can probably handle the job. Already, in this tutorial we’ll be working with a considerably larger dataset. If you happen to be running this code on a server with a GPU and and installed the GPU enabled version of MXNet or remembered to build MXNet with CUDA=1, you might want to substitute the following line for its commented-out counterpart.

In [2]:
# Set the context to CPU
ctx = mx.cpu()

# To set the context to GPU use this
# ctx = mx.gpu()

The MNIST Dataset

We won’t suck up too much wind describing the MNIST dataset for a second time. If you’re unfamiliar with the dataset and are reading these chapters out of sequence, take a look at the data section in the previous chapter on softmax regression from scratch.

We’ll load up data iterators corresponding to the training and test splits of MNIST dataset.

In [3]:
batch_size = 64
num_inputs = 784
num_outputs = 10
def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)
train_data =, transform=transform),
                                      batch_size, shuffle=True)
test_data =, transform=transform),
                              batch_size, shuffle=False)

We’re also going to want to load up an iterator with test data. After we train on the training dataset we’re going to want to test our model on the test data. Otherwise, for all we know, our model could be doing something stupid (or treacherous?) like memorizing the training examples and regurgitating the labels on command.

Multiclass Logistic Regression

Now we’re going to define our model. Remember from our tutorial on linear regression with ``gluon` <./P02-C02-linear-regression-gluon>`__ that we add Dense layers by calling net.add(gluon.nn.Dense(num_outputs)). This leaves the parameter shapes underspecified, but gluon will infer the desired shapes the first time we pass real data through the network.

In [4]:
net = gluon.nn.Sequential()
with net.name_scope():

Parameter initialization

As before, we’re going to register an initializer for our parameters. Remember that gluon doesn’t even know what shape the parameters have because we never specified the input dimension. The parameters will get initialized during the first call to forward method.

Before we can start training we need to initialize our parameters. To stay consistent with the other tutorials, we’ll keep using Remember that gluon doesn’t yet know what shape the parameters should take. So the following code doesn’

In [5]:
net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=ctx)

Softmax Cross Entropy Loss

Note, we didn’t have to include the softmax layer because MXNet’s has an efficient function that simultaneously computes the softmax activation and cross-entropy loss. However, if ever need to get the output probabilities,

In [6]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()


And let’s instantiate an optimizer to make our updates

In [7]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

Evaluation Metric

This time, let’s simplify the evaluation code by relying on MXNet’s built-in metric package.

In [8]:
def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]

Because we initialized our model randomly, and because roughly one tenth of all examples belong to each of the ten classes, we should have an accuracy in the ball park of .10.

In [9]:
evaluate_accuracy(test_data, net)

Execute training loop

In [10]:
epochs = 4
moving_loss = 0.
smoothing_constant = .01

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)

        #  Keep a moving average of the losses
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (curr_loss if ((i == 0) and (e == 0))
                       else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)

    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % (e, moving_loss, train_accuracy, test_accuracy))
Epoch 0. Loss: 1.11375964824, Train_acc 0.792366666667, Test_acc 0.8063
Epoch 1. Loss: 0.851715001028, Train_acc 0.836433333333, Test_acc 0.8474
Epoch 2. Loss: 0.692273606673, Train_acc 0.855266666667, Test_acc 0.8653
Epoch 3. Loss: 0.646741591093, Train_acc 0.864016666667, Test_acc 0.8726