Let's automatically optimize the hyperparameters of the neural network

Neural network hyperparameters

How do you decide? Neural networks take a long time to calculate, so you will often rely on your intuition and experience. However, if it finishes within a certain amount of time (if the calculation takes at most 30 minutes to 1 hour), there is a method called Bayesian optimization. Bayesian optimization is also effective for black boxes that you want to optimize, but it takes a certain amount of time to calculate.

The day before yesterday, I wrote Article about GPyOpt performing Bayesian optimization in Python. This time, we used this method to optimize the hyperparameters of the neural network so that the correct answer rate of MNIST is high. I actually coded it in Python, so please try it.

Result immediately

Well, first of all I will show you the result.

The task to be done this time is to improve the identification rate of MNIST (Handwriting Recognition).

Divide MNIST into training data and test data, and train the neural network with the training data. After that, the test data is identified, but this error rate is reduced.

The hyperparameters to be optimized are as follows.

[Addition 2017/2/8] I turned the code that corrected the number of steps to the number of epochs.

Hyperparameters	range
Number of epochs	1~20
Dropout rate	0~0.9
Number of hidden layer channels (how many times the hidden layer should be a unit of the input layer)	0.51~5
Batch size	50,100,200
Whether to convolve	True,False
Number of layers	1,2,3
Whether to make it residual	True,False
Convolution filter size	5,7,9,11,13

First, as a comparative experiment, an orthodox neural network was used, and the error rate was 2.58%.

And when Bayesian optimization (20 random samplings + 20 Bayesian optimization iterations), the error rate was 1.18%. The hyperparameters after optimization are:

Hyperparameters	range	Comparative experiment	Optimization result
Number of epochs	1~20	10	20(upper limit)
Dropout rate	0~0.9	0	0
Number of hidden layer channels (how many times the hidden layer should be a unit of the input layer)	0.51~5	1	5(upper limit)
Batch size	50,100,200	100	200(upper limit)
Whether to convolve	True,False	False	True
Number of layers	1,2,3	2	3(upper limit)
Whether to make it residual	True,False	False	True
Convolution filter size	5,7,9,11,13	-	9

Based on the result

The number of epochs and the number of hidden layer channels have reached the upper limit.

Well, this time there was a problem with the specifications of the computer, so I reduced the number of epochs (still only 20 iterations, especially the latter one I was trying only the ones with the maximum number of epochs and batch size, so it was a lot of time It took).

If you raise the upper limit such as the number of epochs, the accuracy will be higher.

Well, I'm glad it still seemed to be optimized.

Implementation of neural network

I implemented it with chainer.

I made a test_mnist_nn function. This function takes the hyperparameters of the neural network as input and outputs the error rate. Bayesian optimization is performed to minimize this error rate.

`mnist_nn.py`


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
A neural network for MNIST that optimizes parameters.
The parameters to be optimized are
epoch_num:Number of epochs(1~20)
dropout_r:Dropout rate(0~0.9)
h_cnl:Number of hidden layer channels (how many times the hidden layer should be a unit of the input layer)(0.51~5)
batch_size:Batch size(50,100,200)
conv:Whether to convolve(True or False)
layers:Number of layers(1~3)
residual:Whether to make it residual(True or False)
conv_ksize:Convolution filter size(5,7,9,11,13)
"""

import numpy as np
import chainer
from chainer import optimizers,datasets
import chainer.functions as F
import chainer.links as L

class mnist_nn(chainer.Chain):
    '''
Neural network to perform mnist classification task
    '''
    def __init__(self, dropout_r=0.5, h_cnl=0.51, conv=True, layers=3, residual=True, conv_ksize=9):
        '''
Parameter initialization and
Creating a layer (adding a link)
to hold
        '''
        super(mnist_nn, self).__init__()
        self.dropout_r = dropout_r
        self.h_cnl = h_cnl
        self.conv = conv
        self.layers = layers
        self.residual = residual
        self.conv_ksize = conv_ksize

        if conv:
            p_size = int((conv_ksize-1)/2)
            self.add_link('layer{}'.format(0), L.Convolution2D(1,round(h_cnl),(conv_ksize,conv_ksize),pad=(p_size,p_size)))
            for i in range(1,layers):
                self.add_link('layer{}'.format(i), L.Convolution2D(round(h_cnl),round(h_cnl),(conv_ksize,conv_ksize),pad=(p_size,p_size)))
            self.add_link('fin_layer', L.Convolution2D(round(h_cnl),10,(28,28)))
        else:
            self.add_link('layer{}'.format(0), L.Linear(784,round(784*h_cnl)))
            for i in range(1,layers):
                self.add_link('layer{}'.format(i), L.Linear(round(784*h_cnl),round(784*h_cnl)))
            self.add_link('fin_layer', L.Linear(round(784*h_cnl),10))

    def __call__(self, x, train=False):
        '''
The main body of the neural network
        '''
        if self.conv:
            batch_size = x.shape[0]
            x = x.reshape(batch_size, 1, 28, 28)
            h = chainer.Variable(x)
            h = F.dropout(F.relu(self['layer{}'.format(0)](h)),train=train,ratio=self.dropout_r)
            for i in range(1,self.layers):
                if self.residual:
                    h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r) + h
                else:
                    h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r)
            h = self['fin_layer'](h)[:,:,0,0]
        else:
            h = chainer.Variable(x)
            h = F.dropout(F.relu(self['layer{}'.format(0)](h)),train=train,ratio=self.dropout_r)
            for i in range(1,self.layers):
                if self.residual:
                    h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r) + h
                else:
                    h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r)
            h = self['fin_layer'](h)
        return h
    def loss(self,x,t):
        '''
The error function is cross entropy
        '''
        x = self.__call__(x, True)
        t = chainer.Variable(t)
        loss = F.softmax_cross_entropy(x,t)
        return loss


def test_mnist_nn(epoch_num=10, dropout_r=0.5, h_cnl=0.51, batch_size=100, conv=True, layers=3, residual=True, conv_ksize=9):
    '''
A function that gives the wrong answer rate of MNIST when parameters are entered
    '''
    #For the time being, the seed is fixed (this determines the output to be exact for the parameter.
    np.random.seed(1234)

    #Preparing for model optimization
    model = mnist_nn(dropout_r, h_cnl, conv, layers, residual, conv_ksize)
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    #Data preparation
    train, test = datasets.get_mnist()
    trainn, testn = len(train), len(test)

    #Linspace for log output
    logging_num = np.linspace(0,epoch_num*int(trainn/batch_size),11).astype(int)

    iter = 0
    for epoch in range(epoch_num):
        #Batch creation
        batch_idx = np.array(range(trainn))
        np.random.shuffle(batch_idx)
        for batch_num in np.hsplit(batch_idx,int(trainn/batch_size)):
            batch = train[batch_num]
            x,t = batch[0],batch[1]

            #Learning
            model.zerograds()
            loss = model.loss(x,t)
            #Just in case
            if np.isnan(loss.data):
                print("pass")
                continue
            loss.backward()
            optimizer.update()

            #Log output
            if iter in logging_num:
                print(str(np.where(logging_num==iter)[0][0]*10)+'%','\tcross entropy =',loss.data)
            iter+=1


    #Performance evaluation(Do it in a mini-batch to save memory)
    batch_idx = np.array(range(testn))
    false_p = []
    for batch_num in np.hsplit(batch_idx,100):
        batch = test[batch_num]
        x,t = batch[0].reshape(len(batch_num), 1, 28, 28),batch[1]
        res = model(x).data.argmax(axis=1)
        false_p.append(np.mean(t!=res))
    false_p = np.mean(false_p)

    print("False {:.2f}%".format(false_p*100))
    return false_p

if __name__ == '__main__':
    test_mnist_nn(epoch_num=10, dropout_r=0, h_cnl=1, batch_size=100, conv=False, layers=2, residual=False, conv_ksize=9)

Bayesian optimization implementation

I used GPyOpt.

I first sampled 20 pieces and then tried to optimize by doing 100 iterations (actually 100 iterations didn't turn).

`bayesian_opt.py`


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 27 20:09:10 2017

@author: marshi
"""

import GPyOpt
import numpy as np
import mnist_nn
import os

def f(x):
    '''
    mnist_nn.test_mnist_nn wrapper function
    _x[0](1~20)->epoch_num=int(_x[0])
    _x[1](0~0.9)->dropout_r=np.float32(_x[1])
    _x[2](0.51~5)->h_cnl=np.float32(_x[2])
    _x[3](50,100,200)->batch_size=int(_x[3])
    _x[4](0,1)->conv=bool(_x[4])
    _x[5](1,2,3)->layers=int(_x[5])
    _x[6](0,1)->residual=bool(_x[6])
    _x[7](5,7,9,11,13)->conv_ksize=int(_x[7])
    '''
    ret = []
    for _x in x:
        print(_x)
        _ret = mnist_nn.test_mnist_nn(epoch_num = int(_x[0]),
                                        dropout_r = np.float32(_x[1]),
                                        h_cnl = np.float32(_x[2]),
                                        batch_size = int(_x[3]),
                                        conv = bool(_x[4]),
                                        layers = int(_x[5]),
                                        residual = bool(_x[6]),
                                        conv_ksize = int(_x[7]))
        ret.append(_ret)
    ret = np.array(ret)
    return ret

#Specify the area of each variable
bounds = [{'name': 'epochs', 'type': 'continuous', 'domain': (1,20)},
          {'name': 'dropout_r', 'type': 'continuous', 'domain': (0.0,0.9)},
          {'name': 'h_cnl', 'type': 'continuous', 'domain': (0.51,5)},
          {'name': 'batch_size', 'type': 'discrete', 'domain': (50,100,200)},
          {'name': 'conv', 'type': 'discrete', 'domain': (0,1)},
          {'name': 'layers', 'type': 'discrete', 'domain': (1,2,3)},
          {'name': 'residual', 'type': 'discrete', 'domain': (0,1)},
          {'name': 'conv_ksize', 'type': 'discrete', 'domain': (5,7,9,11,13)}]

#Bayesian optimization object creation
#Previously saved X,If there is Y, do it from the middle
#If not, first randomly sample a few samples
filename = "XY.npz"
if os.path.exists(filename):
    XY = np.load(filename)
    X,Y = XY['x'],XY['y']
    myBopt = GPyOpt.methods.BayesianOptimization(f=f, domain=bounds, X=X, Y=Y)
else:
    myBopt = GPyOpt.methods.BayesianOptimization(f=f, domain=bounds)

#Bayesian optimization 100 steps and step-by-step result display
for i in range(1000):
    print(len(myBopt.X),"Samples")
    myBopt.run_optimization(max_iter=1)
    print(myBopt.fx_opt)
    print(myBopt.x_opt)
    #Sequential save
    np.savez(filename, x=myBopt.X, y=myBopt.Y)

Bayesian optimization implementation of neural network hyperparameters (Chainer + GPyOpt)