This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 2, Step 07, I will write down my own points.
--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server
--Quantitatively evaluate the prediction accuracy of machine learning systems using various indicators --When you make improvements to an existing system, you can guarantee that the performance has not deteriorated by evaluation, and you can update the system with confidence. --Understand overfitting and prevent it from occurring
The overfitting of the classifier to the training data due to learning is called ** overfitting **.
If 100% of the feature vectors included in the training data can be identified, even noise that can be ignored becomes a fine identification surface so that it can be correctly identified. Being able to make stable predictions for data other than training data is called ** generalization **.
If the same data is used for training and evaluation, the overfitted system will be highly evaluated, so ** the test data for evaluation must be evaluated using something different from the training data **. (No matter how good the training data is, it doesn't make much sense, and you have to make sure that you are not overfitting.)
item | Contents |
---|---|
Accuracy (correct answer rate) | Ratio of correct test data to all test data |
Precision | Percentage of correct test data out of the test data predicted for the target class |
Recall | Percentage of correct predictions of test data for the target class |
F value | An index showing the balance between Precision and Recall |
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy_score(y_true, y_pred)
precision_score(y_true, y_pred)
recall_score(y_true, y_pred)
f1_score(y_true, y_pred, average='macro')
--By specifying ʻaverage, you can calculate macro average
='macro'and micro average
='micro'`.
--A trade-off between precision and recall
item | Contents |
---|---|
Lower limit of accuracy | The lower limit of accuracy is the case of predicting without guessing. |
Number of classification classes | Since the difficulty level of multi-class classification is naturally higher than that of 2-class classification, the evaluation index has different values depending on the application. |
Test data type | Relative evaluation of different systems should be done with the same test data |
Bias in the number of data | It is desirable that the test data include the data of each class as evenly as possible. |