With a bit of precision, the weight parameters were incredibly reduced ~ CNN's amazing results ~

1.First of all

When it comes to image identification of neural networks, we usually focus on how much accuracy can be improved, but since I am an amanojaku, I will focus on ** how much the weight parameter can be reduced if the accuracy is slightly reduced **. ..

I think that most of the resources are often used for neural networks to increase the last 1-2% of accuracy, so even if you sacrifice about 1-2% of accuracy, you should be able to reduce the weight parameter considerably.

The model used in this experiment uses ** MLP (Multilayer Perceptron) ** and ** CNN (Convolutional Network) ** that identify MNIST (handwritten numbers from 0 to 9) described in the keras tutorial. Since the accuracy of these two models is about 98 to 99%, let's set the ** target accuracy to the 97% range ** and try how much the weight parameter can be reduced.

2.MLP (Multilayer Perceptron)

Here's the basic structure of MLP in the keras tutorial (actually, I've added two Dropouts to it, but I've omitted them for simplicity). スクリーンショット 2020-01-22 11.08.38.png Since MNIST is 28 * 28 pixels, there are 28 * 28 = 784 inputs. There are two hidden layers, both of which are fully connected with n = 512.

Let's first check what happens to the accuracy if this n is gradually reduced. Execute the following code (execution time is about 3 and a half minutes on the GPU of google colab).

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

batch_size = 128
num_classes = 10
epochs = 20

# load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# multi Perceptoron_1
def mlp(n):
    model = Sequential()
    model.add(Dense(n, activation='relu', input_shape=(784,)))
    model.add(Dense(n, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))

    #model.summary()

    model.compile(loss='categorical_crossentropy',
                   optimizer=RMSprop(),
                  metrics=['accuracy'])

    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=0, 
              validation_data=(x_test, y_test))
    
    score = model.evaluate(x_test, y_test, verbose=0)
    return score[1]

# test
list_n = [512, 256, 128, 64, 32, 16]  
x, y = [], []
for n in list_n:
    params = n*n + 796*n + 10
    acc = mlp(n)
    x.append(params)
    y.append(acc)
    print ('n = ', n, ', ', 'params = ', params, ', ', 'accuracy = ', acc)

# graph
import matplotlib.pyplot as plt
plt.scatter(x, y)
plt.xscale('log')
plt.show()

スクリーンショット 2020-01-22 11.05.33.png The horizontal axis of the graph is the number of weight parameters (log), and the vertical axis is the accuracy. If you try to ensure 97% accuracy, is n = 32, params = 26506, accuracy = 0.9711 a turning point?

Then, by sacrificing some accuracy, we can reduce the weight parameter to about 1/25 of the base model because 669706/26506 = 25.26.

Looking at the outline of the model with model.summary () (it is displayed by taking the leading # and executing it), it looks like this スクリーンショット 2020-01-22 11.39.14.png Is it impossible to reduce the weight parameter any more? No, I still have another move.

Where does the weight parameter consume the most? At 784 inputs and ** dense_1 **, (784 + 1) * 32 = 25120 and 95% of the total weight parameters are consumed here. By the way, it becomes 784 + 1 because there is one bias.

Hypothesis that even if the resolution of the numbers is poor, it will be identifiable to some extent, and a model that uses a filter (Max Pooling) to convert 28 * 28 = 784 inputs to 1/4 of 14 * 14 = 196 think. スクリーンショット 2020-01-22 11.51.56.png Execute the code of this model (execution time is about 3 and a half minutes on GPU of google colab).

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
from keras.layers import MaxPooling2D, Flatten  #add to

batch_size = 128
num_classes = 10
epochs = 20

# load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 28, 28, 1)  #Change
x_test = x_test.reshape(10000, 28, 28, 1)  #Change
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# multi Perceptoron_2
def mlp(n):
    model = Sequential()
    model.add(MaxPooling2D(pool_size=(2, 2),input_shape=(28, 28, 1)))  #14 images*Reduced to 14
    model.add(Flatten())  #Make a full bond
    model.add(Dense(n, activation='relu')) 
    model.add(Dense(n, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))

    #model.summary()

    model.compile(loss='categorical_crossentropy',
                   optimizer=RMSprop(),
                  metrics=['accuracy'])

    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=0, 
              validation_data=(x_test, y_test))
    
    score = model.evaluate(x_test, y_test, verbose=0)
    return score[1]

# test
list_n = [512, 256, 128, 64, 32, 16]  
x, y = [], []
for n in list_n:
    params = n*n + 208*n + 10  #Model change
    acc = mlp(n)
    x.append(params)
    y.append(acc)
    print ('n = ', n, ', ', 'params = ', params, ', ', 'accuracy = ', acc)

# graph
import matplotlib.pyplot as plt
plt.scatter(x, y)
plt.xscale('log')
plt.show()

スクリーンショット 2020-01-22 12.25.27.png Overall, I slid towards a slightly less accurate, but n = 64, params = 17418, accuracy = 0.9704 was the turning point.

Then, by sacrificing some accuracy, we found that we could reduce the weight parameter to about 1/38 of the base model by 669706/17418 = 38.44. It can be reduced considerably.

3. Amazing results on CNN

Here's the basic structure of CNN in the keras tutorial (actually, I've added two Dropouts to it, but I've omitted them for simplicity). スクリーンショット 2020-01-22 12.33.27.png The two convolution layers use a 3 * 3 = 9 filter. After that, it is reduced to 1/2 vertically and horizontally by Max Pooling and connected to the fully connected layer of n * 4.

Now let's look at the number and precision of the weight parameters when n is changed. Execute the following code (execution time is about 3 and a half minutes on the GPU of google colab).

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28

# load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# CNN_1
def cnn(n):
    model = Sequential()
    model.add(Conv2D(n, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
    model.add(Conv2D(n*2, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(n*4, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))

    #model.summary()

    model.compile(loss=keras.losses.categorical_crossentropy,
                   optimizer=keras.optimizers.Adadelta(),
                   metrics=['accuracy'])

    model.fit(x_train, y_train,
             batch_size=batch_size,
             epochs=epochs,
             verbose=0,
             validation_data=(x_test, y_test))
    score = model.evaluate(x_test, y_test, verbose=0)
    return score[1]

# test
list_n = [32, 16, 8, 4, 2, 1] 
x, y = [], []
for n in list_n:
    params = 1170*n*n + 56*n + 10
    acc = cnn(n)
    x.append(params)
    y.append(acc)
    print ('n = ', n, ', ', 'params = ', params, ', ', 'accuracy = ', acc)

# graph
import matplotlib.pyplot as plt
plt.scatter(x, y)
plt.xscale('log')
plt.show()

スクリーンショット 2020-01-22 12.52.52.png This is amazing! The fork is n = 2, params = 4802, accuracy = 0.9748.

Then, by sacrificing some accuracy, we were able to reduce the number of weight parameters to about 1/250 of the base model because 1199882/4802 = 249.8.

By the way, as before, if you look at the outline of the model with model.summary (), スクリーンショット 2020-01-22 13.21.50.png Contrary to MLP, the input and convolution parts have a small number of parameters. As for the parameters of the convolution layer, 3 * 3 = 9 filters are commonly used, so the weight parameters are reduced.

Instead, (12 * 12 * 4 + 1) * 8 = 4616, which enters the full bond from the final convolution layer, accounts for 96% of the total weight parameter. Can't you do something about it here?

The convolution layer consumes almost no weight parameters, so the following model is a good idea to add two convolution layers again after Max Pooling and then multiply by Max Pooling.

スクリーンショット 2020-01-22 13.14.00.png Now, let's execute the code of this model (execution time is about 3 and a half minutes on the GPU of google colab).

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28

# load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# CNN_2
def cnn(n):
    model = Sequential()
    model.add(Conv2D(n, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
    model.add(Conv2D(n, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(n*2, (3, 3), activation='relu'))
    model.add(Conv2D(n*2, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(num_classes, activation='softmax'))

    #model.summary()

    model.compile(loss=keras.losses.categorical_crossentropy,
                   optimizer=keras.optimizers.Adadelta(),
                   metrics=['accuracy'])

    model.fit(x_train, y_train,
             batch_size=batch_size,
             epochs=epochs,
             verbose=0,
             validation_data=(x_test, y_test))
    score = model.evaluate(x_test, y_test, verbose=0)
    return score[1]

# test
list_n = [32, 16, 8, 4, 2, 1] 
x, y = [], []
for n in list_n:
    params = 63*n*n + 335*n + 10
    acc = cnn(n)
    x.append(params)
    y.append(acc)
    print ('n = ', n, ', ', 'params = ', params, ', ', 'accuracy = ', acc)

# graph
import matplotlib.pyplot as plt
plt.scatter(x, y)
plt.xscale('log')
plt.show()

スクリーンショット 2020-01-22 13.17.01.png Surprisingly, with only 932 weight parameters, we were able to secure 97% accuracy. This number of weight parameters is less than 1/1000 of the ** base model! ** **

And, compared with 17418 of MLP, it was found that 17418/932 = 18.68, ** the same accuracy can be obtained with the weight parameter of about 1/18 of MLP **.

** In image recognition, it is a very clear result that the convolution layer of CNN works extremely effectively **. CNN is scary!

Recommended Posts

With a bit of precision, the weight parameters were incredibly reduced ~ CNN's amazing results ~
Understand the number of input / output parameters of a convolutional neural network
Implementation of a two-layer neural network 2
Visualize the inner layer of a neural network
With a bit of precision, the weight parameters were incredibly reduced ~ CNN's amazing results ~
Receive a list of the results of parallel processing in Python with starmap
Take a screenshot of the LCD with Python-LEGO Mindstorms
Visualize the characteristic vocabulary of a document with D3.js
Calculate the product of matrices with a character expression?