This post explains important evaluation metrics to check while measuring the performance of a classification model. These are Accuracy, Precision, Recall, Sensitivity, Specificity, False Positive Rate, False Negative Rate, and F1 Score. We’ll cover one by one each metric by calculating its value using a real-life problem.

In the first step, we need to build a confusion matrix by comparing predicted classes as a result of the classification model and actual classes were labeled. And then we can calculate all of these eight metrics using formulas as explained below. 

Let’s understand each metric using a real-life example.

Problem – Evaluate a classification model performance which predicts if a user is coronavirus infected or not using symptoms.

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected. 

Prediction – Our model observes the symptoms and predicted that 2 users are infected and 8 users are not infected. 

Confusion Matrix

First, create a confusion matrix as explained in the post here



PREDICATED CLASS


POSITIVENEGATIVE
ACTUAL CLASSPOSITIVETP = 2FN = 1

NEGATIVEFP = 0TN = 7

Precision

It tells What proportion of predicted positive classes was actually positive

Formula:   Precision =  TP / (TP + FP)

In the given example, Precision Value is: 2 / (2 + 0) = 1

The Result – Our model has a precision of 1 so it means when it predicts a positive class, it is correct 100%.

Recall

It tells What proportion of actual positive classes was predicted correctly

Formula:   Recall = TP / (TP + FN)

In the given example, Precision Value is: 2 / (2 + 1) = 0.667

The Result – Our model has a recall of .667 so it means that it identified 66.7% of positive classes correctly.

False Positive Rate

It is known as the TYPE I error as well. It tells What proportion of Negative classes was wrongly classified. 

Formula – False Positive Rate = FP/(FP+TN)

In the given example, False Positive Rate is: 0 / (0 + 7) = 0

The Result – Our model has a zero False Positive rate so it means that it identified 0% of negative classes wrongly.

False Negative Rate

It is known as the TYPE II error as well. It tells What proportion of Positive classes was wrongly classified. 

Formula – False Negative Rate = FN/(FN+TP)

In the given example, False Negative Rate is: 1 / (1 + 2) = 0.333

The Result – Our model has a .33 False Negative rate so it means that it identified 33% of positive classes wrongly.

Accuracy

It tells what proportion of negative and positive classes are correctly classified.

FormulaAccuracy =  (TP + TN) / (P+N), where P and N are total positive and negative classes.

In the given example, Accuracy is (2+7)/(3+7)= .90

The Result – Our model has a .90 accuracy value so it means that 90% of the time, the prediction is correct. 

Sensitivity

It tells What proportion of actual positive classes was predicted correctly. This is the same as the Recall value.

Formula:   Sensitivity =  TP / (TP + FN)

In the given example, Sensitivity is: 2 / (2 + 1) = 0.667

The Result – Our model has a sensitivity of .667 so it means that it identified 66.7% of positive classes correctly.

Specificity

It tells What proportion of actual negative classes was predicted correctly

Formula:   Specificity =  TN / (TN + FP)

In the given example, Specificity is: 7 / (7 + 0) = 1

The Result – Our model has a Specificity of 1 so it means that it identified 100% of negative classes correctly.

F1 Score

It tells that both precision and recall are balanced. Neither is too high or nor is too low. It’s the best value is 1 when both are balanced and 0 if any of them is very high. 

Formula: F1 Score = 2 x ((Precision*Recall) / (Precision + Recall))

In the given example, F1 Score is: 2 x (1 * .667) /(1+.667) = .800


The result – Our model has an F1 Score of .80 which is closed to best value 1 so precision and recall are in balance.

Conclusion

I used a very simple example so you can understand easily all of these metrics. All formulas are easy but mess with each other when we do an implementation. Ok, Next is which metrics we should consider to decide if a model is best or not. I’ll write a dedicated post for this in the next few days.