Overfitting and regularization (with gluon)

Now that we’ve built a logistic regression model from scratch, let’s make this more efficient with gluon.

[ROUGH DRAFT - RELEASE STAGE: DOGFOOD]

In [9]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np
ctx = mx.gpu()

The MNIST Dataset

In [10]:
mnist = mx.test_utils.get_mnist()
num_examples = 1000
batch_size = 64
train_data = mx.gluon.data.DataLoader(
    mx.gluon.data.ArrayDataset(mnist["train_data"][:num_examples],
                               mnist["train_label"][:num_examples].astype(np.float32)),
    batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(
    mx.gluon.data.ArrayDataset(mnist["test_data"][:num_examples],
                               mnist["test_label"][:num_examples].astype(np.float32)),
    batch_size, shuffle=False)

Multiclass Logistic Regression

In [11]:
net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Dense(10))

Parameter initialization

In [12]:
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

Softmax Cross Entropy Loss

In [13]:
loss = gluon.loss.SoftmaxCrossEntropyLoss()

Optimizer

In [14]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01, 'wd': 0.00})

Evaluation Metric

In [15]:
def evaluate_accuracy(data_iterator, net):
    numerator = 0.
    denominator = 0.
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        numerator += nd.sum(predictions == label)
        denominator += data.shape[0]
    return (numerator / denominator).asscalar()

Execute training loop

In [16]:
epochs = 200
moving_loss = 0.

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            cross_entropy = loss(output, label)
        cross_entropy.backward()
        trainer.step(data.shape[0])

        ##########################
        #  Keep a moving average of the losses
        ##########################
        if i == 0:
            moving_loss = nd.mean(cross_entropy).asscalar()
        else:
            moving_loss = .99 * moving_loss + .01 * nd.mean(cross_entropy).asscalar()

    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    if e % 20 == 0:
        print("Completed epoch %s. Loss: %s, Train_acc %s, Test_acc %s" %
              (e+1, moving_loss, train_accuracy, test_accuracy))
Completed epoch 1. Loss: 2.323024052, Train_acc 0.25, Test_acc 0.205
Completed epoch 21. Loss: 0.93319004051, Train_acc 0.832, Test_acc 0.73
Completed epoch 41. Loss: 0.726637174525, Train_acc 0.861, Test_acc 0.765
Completed epoch 61. Loss: 0.6199838456, Train_acc 0.883, Test_acc 0.783
Completed epoch 81. Loss: 0.612719210524, Train_acc 0.89, Test_acc 0.795
Completed epoch 101. Loss: 0.491416902033, Train_acc 0.901, Test_acc 0.806
Completed epoch 121. Loss: 0.409430377293, Train_acc 0.909, Test_acc 0.813
Completed epoch 141. Loss: 0.523765346183, Train_acc 0.917, Test_acc 0.822
Completed epoch 161. Loss: 0.379422619004, Train_acc 0.918, Test_acc 0.828
Completed epoch 181. Loss: 0.419000796174, Train_acc 0.922, Test_acc 0.83

Conclusion

Now let’s take a look at how to implement modern neural networks.

For whinges or inquiries, open an issue on GitHub.