This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 2, Step 07, I will write down my own points.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

--Quantitatively evaluate the prediction accuracy of machine learning systems using various indicators --When you make improvements to an existing system, you can guarantee that the performance has not deteriorated by evaluation, and you can update the system with confidence. --Understand overfitting and prevent it from occurring

07.1 Training and test data, and overfitting and generalization

The overfitting of the classifier to the training data due to learning is called ** overfitting **.

If 100% of the feature vectors included in the training data can be identified, even noise that can be ignored becomes a fine identification surface so that it can be correctly identified. Being able to make stable predictions for data other than training data is called ** generalization **.

If the same data is used for training and evaluation, the overfitted system will be highly evaluated, so ** the test data for evaluation must be evaluated using something different from the training data **. (No matter how good the training data is, it doesn't make much sense, and you have to make sure that you are not overfitting.)

07.2 Evaluation index

item	Contents
Accuracy (correct answer rate)	Ratio of correct test data to all test data
Precision	Percentage of correct test data out of the test data predicted for the target class
Recall	Percentage of correct predictions of test data for the target class
F value	An index showing the balance between Precision and Recall

Implementation

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_score(y_true, y_pred)
precision_score(y_true, y_pred)
recall_score(y_true, y_pred)
f1_score(y_true, y_pred, average='macro')

--By specifying ʻaverage, you can calculate macro average ='macro'and micro average='micro'`. --A trade-off between precision and recall

07.3 Precautions for evaluation

item	Contents
Lower limit of accuracy	The lower limit of accuracy is the case of predicting without guessing.
Number of classification classes	Since the difficulty level of multi-class classification is naturally higher than that of 2-class classification, the evaluation index has different values depending on the application.
Test data type	Relative evaluation of different systems should be done with the same test data
Bias in the number of data	It is desirable that the test data include the data of each class as evenly as possible.

Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 07 Memo "Evaluation"