(Deep learning) Images were collected from the Flickr API and discriminated by transfer learning with VGG16.

Introduction

** ~~ The article has been revised (March 10, 2020) ~~ ** ** Corrected the article (May 4, 2020) **

This article summarizes the following:

By the way, I'm a beginner in machine learning, so I may be using inappropriate expressions. In that case, we apologize for the inconvenience, but we would appreciate it if you could point out.

The execution environment is as follows.

The URLs that I referred to are listed below. https://qiita.com/ayumiya/items/e1e87df54c41519be6b4

Collect images

This time, I used the flickr API to collect a large number of images. Regarding this, we collected images by registering flickr, issuing API, secret KEY, etc. with reference to the following URL. http://ykubot.com/2017/11/05/flickr-api/

Regarding collection, I was able to collect images automatically by executing the following program on python. The time required to collect 300 sheets was about 5 minutes in this case.

Below is the full text of the program.

collection.py



import os
import time
import traceback
 
import flickrapi
from urllib.request import urlretrieve
 
import sys
from retry import retry
flickr_api_key = "xxxx" #Enter the API KEY issued here
secret_key = "xxxx" #Enter the SECRET KEY issued here
 
keyword = sys.argv[1]
 
 
@retry()
def get_photos(url, filepath):
    urlretrieve(url, filepath)
    time.sleep(1)
 
 
if __name__ == '__main__':
 
    flicker = flickrapi.FlickrAPI(flickr_api_key, secret_key, format='parsed-json')
    response = flicker.photos.search(
        text='child',#Enter the word you want to search
        per_page=300,#Specify how many sheets you want to collect
        media='photos',
        sort='relevance',
        safe_search=1,
        extras='url_q,license'
    )
    photos = response['photos']
 
    try:
        if not os.path.exists('./image-data/' + 'child'):
            os.mkdir('./image-data/' +'child')
 
        for photo in photos['photo']:
            url_q = photo['url_q']
            filepath = './image-data/' + 'child' + '/' + photo['id'] + '.jpg'
            get_photos(url_q, filepath)
 
    except Exception as e:
        traceback.print_exc()

This time, we collected 300 of these images to recognize "child: child" and "old man: elder". As I will explain later, it is estimated that the search word elder did not have a high recognition rate because images of plants, young women, etc. were collected as well as old people.

Creating an image recognition program

The following article describes the image recognition program. I will focus on the points that I have stumbled upon and the points that I should be aware of from the overall picture, and then write the full program.

Imported libraries, modules, etc.

child.py



import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from keras.utils.np_utils import to_categorical
from keras.layers import Dense, Dropout, Flatten, Input
from keras.applications.vgg16 import VGG16
from keras.models import Model, Sequential
from keras import optimizers

Precautions for creating model functions

child.py



model = Model(input=vgg16.input, output=top_model(vgg16.output))

Originally, the specification environment installed tensorflow and quoted it as the library keras in it. Example: from tensorflow.keras.utils import to_categorical

If you proceed as it is, in defining this Model function,

('Keyword argument not understood:', 'input')

An error has occurred.

It seems that this happens when the version of keras is old, I also updated keras itself, but the error continued to appear. Therefore, when I tried to write to read keras itself, it was solved.

This error summary and update of keras itself https://sugiyamayoshiaki.jp/%E3%80%90%E3%82%A8%E3%83%A9%E3%83%BC%E3%80%91typeerror-keyword-argument-not-understood-data_format/

VGG16 model overview

The model used this time is a trained model based on a multi-layer perceptron called VGG16.

URL:https://arxiv.org/pdf/1409.1556.pdf Author: Karen Simonyan & Andrew Zisserman Visual Geometry Group, Department of Engineering Science, University of Oxford

002.jpeg

The above ** image D ** is the model handled this time. It is a neural network with 13 convolutional layers ** with a kernel size of ** 3D and 3 fully connected layers. This algorithm was announced in 2015, but before that, the kernel size was very large, such as 9 or 11 dimensions, so there was a problem that the discrimination rate was low although the calculation load was high. Therefore, VGG16 is a method that solves this problem by reducing the kernel size to 3 and making the layers deeper (13 layers).   Furthermore, the point of this model is that you can refer to a weighted model that has undergone a large amount of image recognition learning on an image site called ImageNet. It can be identified as 1000 types of items. Therefore, there are two ways to use it.

One is to use the weighted model that has been trained as it is to find the closest one from 1000 types. The second method is to train a new model using the model itself without using weights. This time, I would like to use this second method to distinguish between children and old people.

Let's take a look at the contents of this model.

child.py


model.summary()

output


_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________


Looking at the ** Output shape **, it is (None, 244, 244, 3). This represents ** (number of samples, vertical resolution, horizontal resolution, number of channels (≈ RGB if the number of dimensions is color)) **. Looking at this value, we can see that the number of dimensions increases from 64 to 128 to 256 to 512 after passing through the convolution layer.

Modified for transfer learning

child.py



input_tensor = Input(shape=(50, 50, 3))
vgg16 = VGG16(include_top=False, weights='imagenet', input_tensor=input_tensor)

top_model = Sequential()
top_model.add(Flatten(input_shape=vgg16.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='softmax'))

Add after VGG16 layer to divide into two types, children and old people.

https://qiita.com/MuAuan/items/86a56637a1ebf455e180

Create a function to determine whether it is a child or an old man

child.py



def child(img):
    img = cv2.resize(img, (50, 50))
    pred = np.argmax(model.predict(np.array([img])))
    if pred == 0:
        return 'This is a kid'
    else:
        return 'This is an old man'

As a result, the correct answer rate was not very good. This is because if you search for "elder" on flickr, you will find photos of non-old people, so I think that learning is not going well. The photo below is an example.

003.png

in conclusion

When using an already trained model such as VGG16, it is effective to use a dataset that can make good use of the trained content.

I hope you can use it by riding firmly on the shoulders of your ancestors.

The full program text is stored below. https://github.com/Fumio-eisan/vgg16_child20200310

Recommended Posts

(Deep learning) Images were collected from the Flickr API and discriminated by transfer learning with VGG16.
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
[Part 4] Use Deep Learning to forecast the weather from weather images
[Part 1] Use Deep Learning to forecast the weather from weather images
[Part 3] Use Deep Learning to forecast the weather from weather images
Recognize your boss and hide the screen with Deep Learning
[Part 2] Use Deep Learning to forecast the weather from weather images
The procedure from generating and saving a learning model by machine learning, making it an API server, and communicating with JSON from a browser
Use the Flickr API from Python
I tried to refactor the template code posted in "Getting images from Flickr API with Python" (Part 2)
Realize environment construction for "Deep Learning from scratch" with docker and Vagrant
Voice processing by deep learning: Let's identify who the voice actor is from the voice
Prepare the environment for O'Reilly's book "Deep Learning from scratch" with apt-get (Debian 8)
Try fine tuning (transfer learning), which is the mainstream with images with keras, with data learning
Judgment whether it is my child from the photograph of Shiba Inu by deep learning (4) Visualization by Grad-CAM and Guided Grad-CAM
The story of doing deep learning with TPU
Chainer and deep learning learned by function approximation
99.78% accuracy with deep learning by recognizing handwritten hiragana
Parallel learning of deep learning by Keras and Kubernetes
Extract images and tables from pdf with python to reduce the burden of reporting
Get data from Poloniex, a cryptocurrency exchange, via API and use deep learning to forecast prices for the next day.