Faster, and portable through hybridizing

The tutorials we saw so far adopt the imperative, or define-by-run, programming paradigm. This is how we write Python programs. Another commonly used programming paradigm by deep learning frameworks is the symbolic, or define-then-run, programming. It consists of three steps:

  • define the workloads, such as creating the neural network
  • compile the program into a front-end language, e.g. Python, independent format
  • feed with data to run

This compilation step may optimize the program to be more efficient to run, and also the resulting language-independent format make the program portable to various front-end languages.

gluon provides a hybrid mechanism to seamless combine both declarative programming and imperative programming. Users can freely switch between them to enjoy the advantages of both paradigms.

HybridSequential

We already learned how to use Sequential to stack the layers. Now, we have HybridSequential that constructs a hybrid network. Its usage is similar to Sequential:

In [5]:
import mxnet as mx
from mxnet.gluon import nn
from mxnet import nd

def get_net():
    # construct a MLP
    net = nn.HybridSequential()
    with net.name_scope():
        net.add(nn.Dense(256, activation="relu"))
        net.add(nn.Dense(128, activation="relu"))
        net.add(nn.Dense(2))
    # initialize the parameters
    net.collect_params().initialize()
    return net

# forward
x = nd.random_normal(shape=(1, 512))
net = get_net()
print('=== net(x) ==={}'.format(net(x)))
=== net(x) ===
[[-0.11171372  0.13755465]]
<NDArray 1x2 @cpu(0)>

You can call the hybridize method on a Block to activate compile and optimize it. Only HybridBlocks, e.g. HybridSequential, can be compiled. But you can still call hybridize on normal Block and its HybridBlock children will be compiled instead. We will talk more about HybridBlock later.

In [6]:
net.hybridize()
print('=== net(x) ==={}'.format(net(x)))
=== net(x) ===
[[-0.11171372  0.13755465]]
<NDArray 1x2 @cpu(0)>

Performance

We compare the performance between before hybridizing and after hybridizing by forwarding 1000 times.

In [8]:
from time import time
def bench(net, x):
    mx.nd.waitall()
    start = time()
    for i in range(1000):
        y = net(x)
    mx.nd.waitall()
    return time() - start

net = get_net()
print('Before hybridizing: %.4f sec'%(bench(net, x)))
net.hybridize()
print('After hybridizing: %.4f sec'%(bench(net, x)))
Before hybridizing: 0.3096 sec
After hybridizing: 0.1652 sec

As can been seen, there is a significant speedup after hybridizing.

Get the symbolic program

Previously, we feed net with NDArray data x, and then net(x) returned the forward results. Now if we feed it with a Symbol placeholder, then the corresponding symbolic program will be returned.

In [9]:
from mxnet import sym
x = sym.var('data')
print('=== input data holder ===')
print(x)

y = net(x)
print('\n=== the symbolic program of net===')
print(y)

y_json = y.tojson()
print('\n=== the according json definition===')
print(y_json)
=== input data holder ===
<Symbol data>

=== the symbolic program of net===
<Symbol hybridsequential4_dense2_fwd>

=== the according json definition===
{
  "nodes": [
    {
      "op": "null",
      "name": "data",
      "inputs": []
    },
    {
      "op": "null",
      "name": "hybridsequential4_dense0_weight",
      "attr": {
        "__dtype__": "0",
        "__lr_mult__": "1.0",
        "__shape__": "(256, 0)",
        "__wd_mult__": "1.0"
      },
      "inputs": []
    },
    {
      "op": "null",
      "name": "hybridsequential4_dense0_bias",
      "attr": {
        "__dtype__": "0",
        "__init__": "zeros",
        "__lr_mult__": "1.0",
        "__shape__": "(256,)",
        "__wd_mult__": "1.0"
      },
      "inputs": []
    },
    {
      "op": "FullyConnected",
      "name": "hybridsequential4_dense0_fwd",
      "attr": {"num_hidden": "256"},
      "inputs": [[0, 0, 0], [1, 0, 0], [2, 0, 0]]
    },
    {
      "op": "Activation",
      "name": "hybridsequential4_dense0_relu_fwd",
      "attr": {"act_type": "relu"},
      "inputs": [[3, 0, 0]]
    },
    {
      "op": "null",
      "name": "hybridsequential4_dense1_weight",
      "attr": {
        "__dtype__": "0",
        "__lr_mult__": "1.0",
        "__shape__": "(128, 0)",
        "__wd_mult__": "1.0"
      },
      "inputs": []
    },
    {
      "op": "null",
      "name": "hybridsequential4_dense1_bias",
      "attr": {
        "__dtype__": "0",
        "__init__": "zeros",
        "__lr_mult__": "1.0",
        "__shape__": "(128,)",
        "__wd_mult__": "1.0"
      },
      "inputs": []
    },
    {
      "op": "FullyConnected",
      "name": "hybridsequential4_dense1_fwd",
      "attr": {"num_hidden": "128"},
      "inputs": [[4, 0, 0], [5, 0, 0], [6, 0, 0]]
    },
    {
      "op": "Activation",
      "name": "hybridsequential4_dense1_relu_fwd",
      "attr": {"act_type": "relu"},
      "inputs": [[7, 0, 0]]
    },
    {
      "op": "null",
      "name": "hybridsequential4_dense2_weight",
      "attr": {
        "__dtype__": "0",
        "__lr_mult__": "1.0",
        "__shape__": "(2, 0)",
        "__wd_mult__": "1.0"
      },
      "inputs": []
    },
    {
      "op": "null",
      "name": "hybridsequential4_dense2_bias",
      "attr": {
        "__dtype__": "0",
        "__init__": "zeros",
        "__lr_mult__": "1.0",
        "__shape__": "(2,)",
        "__wd_mult__": "1.0"
      },
      "inputs": []
    },
    {
      "op": "FullyConnected",
      "name": "hybridsequential4_dense2_fwd",
      "attr": {"num_hidden": "2"},
      "inputs": [[8, 0, 0], [9, 0, 0], [10, 0, 0]]
    }
  ],
  "arg_nodes": [0, 1, 2, 5, 6, 9, 10],
  "node_row_ptr": [
    0,
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10,
    11,
    12
  ],
  "heads": [[11, 0, 0]],
  "attrs": {"mxnet_version": ["int", 1100]}
}

Now we can save both the program and parameters onto disk, so that it can be loaded later not only in Python, but in all other supported languages, such as C++, R, and Scala, as well.

In [5]:
y.save('model.json')
net.save_params('model.params')

HybridBlock

Now let’s dive deeper into how hybridize works. Remember that another way to construct a network is to define a subclass of gluon.Block, by which we can flexibly write the forward function.

Unsurprisingly, there is a hybridized version HybridBlock. We implement the previous MLP as:

In [10]:
from mxnet import gluon

class Net(gluon.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        with self.name_scope():
            self.fc1 = nn.Dense(256)
            self.fc2 = nn.Dense(128)
            self.fc3 = nn.Dense(2)

    def hybrid_forward(self, F, x):
        # F is a function space that depends on the type of x
        # If x's type is NDArray, then F will be mxnet.nd
        # If x's type is Symbol, then F will be mxnet.sym
        print('type(x): {}, F: {}'.format(
                type(x).__name__, F.__name__))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

Now we feed data into the network, we can see that hybrid_forward is called twice.

In [11]:
net = Net()
net.collect_params().initialize()
x = nd.random_normal(shape=(1, 512))
print('=== 1st forward ===')
y = net(x)
print('=== 2nd forward ===')
y = net(x)
=== 1st forward ===
type(x): NDArray, F: mxnet.ndarray
=== 2nd forward ===
type(x): NDArray, F: mxnet.ndarray

Now run it again after hybridizing.

In [12]:
net.hybridize()
print('=== 1st forward ===')
y = net(x)
print('=== 2nd forward ===')
y = net(x)
=== 1st forward ===
type(x): Symbol, F: mxnet.symbol
=== 2nd forward ===

It differs from the previous execution in two aspects:

  1. the input data type now is Symbol even when we fed a NDArray into net, because gluon implicitly constructed a symbolic data placeholder.
  2. hybrid_forward is called once at the first time we run net(x). It is because gluon will construct the symbolic program on the first forward, and then keep it for reuse later.

One main reason that the network is faster after hybridizing is because we don’t need to repeatedly invoke the Python forward function, while keeping all computations within the highly efficient C++ backend engine.

But the potential drawback is the loss of flexibility to write the forward function. In other ways, inserting print for debugging or control logic such as if and for into the forward function is not possible now.

Conclusion

Through HybridSequental and HybridBlock, we can convert an imperative program into a symbolic program by calling hybridize.