Neural network hyperparameters
How do you decide? Neural networks take a long time to calculate, so you will often rely on your intuition and experience. However, if it finishes within a certain amount of time (if the calculation takes at most 30 minutes to 1 hour), there is a method called Bayesian optimization. Bayesian optimization is also effective for black boxes that you want to optimize, but it takes a certain amount of time to calculate.
The day before yesterday, I wrote Article about GPyOpt performing Bayesian optimization in Python. This time, we used this method to optimize the hyperparameters of the neural network so that the correct answer rate of MNIST is high. I actually coded it in Python, so please try it.
Well, first of all I will show you the result.
The task to be done this time is to improve the identification rate of MNIST (Handwriting Recognition).
Divide MNIST into training data and test data, and train the neural network with the training data. After that, the test data is identified, but this error rate is reduced.
The hyperparameters to be optimized are as follows.
[Addition 2017/2/8] I turned the code that corrected the number of steps to the number of epochs.
Hyperparameters | range |
---|---|
Number of epochs | 1~20 |
Dropout rate | 0~0.9 |
Number of hidden layer channels (how many times the hidden layer should be a unit of the input layer) | 0.51~5 |
Batch size | 50,100,200 |
Whether to convolve | True,False |
Number of layers | 1,2,3 |
Whether to make it residual | True,False |
Convolution filter size | 5,7,9,11,13 |
First, as a comparative experiment, an orthodox neural network was used, and the error rate was 2.58%.
And when Bayesian optimization (20 random samplings + 20 Bayesian optimization iterations), the error rate was 1.18%. The hyperparameters after optimization are:
Hyperparameters | range | Comparative experiment | Optimization result |
---|---|---|---|
Number of epochs | 1~20 | 10 | 20(upper limit) |
Dropout rate | 0~0.9 | 0 | 0 |
Number of hidden layer channels (how many times the hidden layer should be a unit of the input layer) | 0.51~5 | 1 | 5(upper limit) |
Batch size | 50,100,200 | 100 | 200(upper limit) |
Whether to convolve | True,False | False | True |
Number of layers | 1,2,3 | 2 | 3(upper limit) |
Whether to make it residual | True,False | False | True |
Convolution filter size | 5,7,9,11,13 | - | 9 |
The number of epochs and the number of hidden layer channels have reached the upper limit.
Well, this time there was a problem with the specifications of the computer, so I reduced the number of epochs (still only 20 iterations, especially the latter one I was trying only the ones with the maximum number of epochs and batch size, so it was a lot of time It took).
If you raise the upper limit such as the number of epochs, the accuracy will be higher.
Well, I'm glad it still seemed to be optimized.
I implemented it with chainer.
I made a test_mnist_nn function. This function takes the hyperparameters of the neural network as input and outputs the error rate. Bayesian optimization is performed to minimize this error rate.
mnist_nn.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
A neural network for MNIST that optimizes parameters.
The parameters to be optimized are
epoch_num:Number of epochs(1~20)
dropout_r:Dropout rate(0~0.9)
h_cnl:Number of hidden layer channels (how many times the hidden layer should be a unit of the input layer)(0.51~5)
batch_size:Batch size(50,100,200)
conv:Whether to convolve(True or False)
layers:Number of layers(1~3)
residual:Whether to make it residual(True or False)
conv_ksize:Convolution filter size(5,7,9,11,13)
"""
import numpy as np
import chainer
from chainer import optimizers,datasets
import chainer.functions as F
import chainer.links as L
class mnist_nn(chainer.Chain):
'''
Neural network to perform mnist classification task
'''
def __init__(self, dropout_r=0.5, h_cnl=0.51, conv=True, layers=3, residual=True, conv_ksize=9):
'''
Parameter initialization and
Creating a layer (adding a link)
to hold
'''
super(mnist_nn, self).__init__()
self.dropout_r = dropout_r
self.h_cnl = h_cnl
self.conv = conv
self.layers = layers
self.residual = residual
self.conv_ksize = conv_ksize
if conv:
p_size = int((conv_ksize-1)/2)
self.add_link('layer{}'.format(0), L.Convolution2D(1,round(h_cnl),(conv_ksize,conv_ksize),pad=(p_size,p_size)))
for i in range(1,layers):
self.add_link('layer{}'.format(i), L.Convolution2D(round(h_cnl),round(h_cnl),(conv_ksize,conv_ksize),pad=(p_size,p_size)))
self.add_link('fin_layer', L.Convolution2D(round(h_cnl),10,(28,28)))
else:
self.add_link('layer{}'.format(0), L.Linear(784,round(784*h_cnl)))
for i in range(1,layers):
self.add_link('layer{}'.format(i), L.Linear(round(784*h_cnl),round(784*h_cnl)))
self.add_link('fin_layer', L.Linear(round(784*h_cnl),10))
def __call__(self, x, train=False):
'''
The main body of the neural network
'''
if self.conv:
batch_size = x.shape[0]
x = x.reshape(batch_size, 1, 28, 28)
h = chainer.Variable(x)
h = F.dropout(F.relu(self['layer{}'.format(0)](h)),train=train,ratio=self.dropout_r)
for i in range(1,self.layers):
if self.residual:
h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r) + h
else:
h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r)
h = self['fin_layer'](h)[:,:,0,0]
else:
h = chainer.Variable(x)
h = F.dropout(F.relu(self['layer{}'.format(0)](h)),train=train,ratio=self.dropout_r)
for i in range(1,self.layers):
if self.residual:
h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r) + h
else:
h = F.dropout(F.relu(self['layer{}'.format(i)](h)),train=train,ratio=self.dropout_r)
h = self['fin_layer'](h)
return h
def loss(self,x,t):
'''
The error function is cross entropy
'''
x = self.__call__(x, True)
t = chainer.Variable(t)
loss = F.softmax_cross_entropy(x,t)
return loss
def test_mnist_nn(epoch_num=10, dropout_r=0.5, h_cnl=0.51, batch_size=100, conv=True, layers=3, residual=True, conv_ksize=9):
'''
A function that gives the wrong answer rate of MNIST when parameters are entered
'''
#For the time being, the seed is fixed (this determines the output to be exact for the parameter.
np.random.seed(1234)
#Preparing for model optimization
model = mnist_nn(dropout_r, h_cnl, conv, layers, residual, conv_ksize)
optimizer = optimizers.Adam()
optimizer.setup(model)
#Data preparation
train, test = datasets.get_mnist()
trainn, testn = len(train), len(test)
#Linspace for log output
logging_num = np.linspace(0,epoch_num*int(trainn/batch_size),11).astype(int)
iter = 0
for epoch in range(epoch_num):
#Batch creation
batch_idx = np.array(range(trainn))
np.random.shuffle(batch_idx)
for batch_num in np.hsplit(batch_idx,int(trainn/batch_size)):
batch = train[batch_num]
x,t = batch[0],batch[1]
#Learning
model.zerograds()
loss = model.loss(x,t)
#Just in case
if np.isnan(loss.data):
print("pass")
continue
loss.backward()
optimizer.update()
#Log output
if iter in logging_num:
print(str(np.where(logging_num==iter)[0][0]*10)+'%','\tcross entropy =',loss.data)
iter+=1
#Performance evaluation(Do it in a mini-batch to save memory)
batch_idx = np.array(range(testn))
false_p = []
for batch_num in np.hsplit(batch_idx,100):
batch = test[batch_num]
x,t = batch[0].reshape(len(batch_num), 1, 28, 28),batch[1]
res = model(x).data.argmax(axis=1)
false_p.append(np.mean(t!=res))
false_p = np.mean(false_p)
print("False {:.2f}%".format(false_p*100))
return false_p
if __name__ == '__main__':
test_mnist_nn(epoch_num=10, dropout_r=0, h_cnl=1, batch_size=100, conv=False, layers=2, residual=False, conv_ksize=9)
I used GPyOpt.
I first sampled 20 pieces and then tried to optimize by doing 100 iterations (actually 100 iterations didn't turn).
bayesian_opt.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 27 20:09:10 2017
@author: marshi
"""
import GPyOpt
import numpy as np
import mnist_nn
import os
def f(x):
'''
mnist_nn.test_mnist_nn wrapper function
_x[0](1~20)->epoch_num=int(_x[0])
_x[1](0~0.9)->dropout_r=np.float32(_x[1])
_x[2](0.51~5)->h_cnl=np.float32(_x[2])
_x[3](50,100,200)->batch_size=int(_x[3])
_x[4](0,1)->conv=bool(_x[4])
_x[5](1,2,3)->layers=int(_x[5])
_x[6](0,1)->residual=bool(_x[6])
_x[7](5,7,9,11,13)->conv_ksize=int(_x[7])
'''
ret = []
for _x in x:
print(_x)
_ret = mnist_nn.test_mnist_nn(epoch_num = int(_x[0]),
dropout_r = np.float32(_x[1]),
h_cnl = np.float32(_x[2]),
batch_size = int(_x[3]),
conv = bool(_x[4]),
layers = int(_x[5]),
residual = bool(_x[6]),
conv_ksize = int(_x[7]))
ret.append(_ret)
ret = np.array(ret)
return ret
#Specify the area of each variable
bounds = [{'name': 'epochs', 'type': 'continuous', 'domain': (1,20)},
{'name': 'dropout_r', 'type': 'continuous', 'domain': (0.0,0.9)},
{'name': 'h_cnl', 'type': 'continuous', 'domain': (0.51,5)},
{'name': 'batch_size', 'type': 'discrete', 'domain': (50,100,200)},
{'name': 'conv', 'type': 'discrete', 'domain': (0,1)},
{'name': 'layers', 'type': 'discrete', 'domain': (1,2,3)},
{'name': 'residual', 'type': 'discrete', 'domain': (0,1)},
{'name': 'conv_ksize', 'type': 'discrete', 'domain': (5,7,9,11,13)}]
#Bayesian optimization object creation
#Previously saved X,If there is Y, do it from the middle
#If not, first randomly sample a few samples
filename = "XY.npz"
if os.path.exists(filename):
XY = np.load(filename)
X,Y = XY['x'],XY['y']
myBopt = GPyOpt.methods.BayesianOptimization(f=f, domain=bounds, X=X, Y=Y)
else:
myBopt = GPyOpt.methods.BayesianOptimization(f=f, domain=bounds)
#Bayesian optimization 100 steps and step-by-step result display
for i in range(1000):
print(len(myBopt.X),"Samples")
myBopt.run_optimization(max_iter=1)
print(myBopt.fx_opt)
print(myBopt.x_opt)
#Sequential save
np.savez(filename, x=myBopt.X, y=myBopt.Y)
Recommended Posts