This post explains important evaluation metrics to check while measuring the performance of a classification model. These are Accuracy, Precision, Recall, Sensitivity, Specificity, False Positive Rate, False Negative Rate,* **and*** F1 Score**. We’ll cover one by one each metric by calculating its value using a real-life problem.

In the first step, we need to build a **confusion matrix** by comparing predicted classes as a result of the classification model and actual classes were labeled. And then we can calculate all of these eight metrics using formulas as explained below.

Let’s understand each metric using a real-life example.

*Problem** – Evaluate a classification model performance which predicts if a user is coronavirus infected or not using symptoms.*

**Actual** – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.

**Prediction** – Our model observes the symptoms and predicted that 2 users are infected and 8 users are not infected.

## Confusion Matrix

First, create a confusion matrix as explained in the post here.

PREDICATED CLASS | |||

POSITIVE | NEGATIVE | ||

ACTUAL CLASS | POSITIVE | TP = 2 | FN = 1 |

NEGATIVE | FP = 0 | TN = 7 |

## Precision

It tells *What proportion of **predicted positive classes** was *** actually positive**.

**Formula: Precision = TP / (TP + FP)**

In the given example, Precision Value is: 2 / (2 + 0) = 1

** The Result** – Our model has a precision of 1 so it means when it predicts a positive class, it is correct 100%.

## Recall

It tells *What proportion of **actual** **positive classes** was *** predicted correctly**.

**Formula: Recall = TP / (TP + FN)**

In the given example, Precision Value is: 2 / (2 + 1) = 0.667

** The Result** – Our model has a recall of .667 so it means that it identified 66.7% of positive classes correctly.

## False Positive Rate

It is known as the **TYPE I** error as well. It tells What proportion of **Negative classes** was wrongly classified.

**Formula – False Positive Rate = FP/(FP+TN)**

In the given example, False Positive Rate is: 0 / (0 + 7) = 0

**The Result** – Our model has a zero False Positive rate so it means that it identified 0% of negative classes wrongly.

## False Negative Rate

It is known as th**e TYPE II** error as well. It tells What proportion of **Positive classes** was wrongly classified.

**Formula – False Negative Rate = FN/(FN+TP)**

In the given example, False Negative Rate is: 1 / (1 + 2) = 0.333

** The Result** – Our model has a .33 False Negative rate so it means that it identified 33% of positive classes wrongly.

## Accuracy

It tells what proportion of **negative** and **positive** classes are** correctly classified**.

**Formula** – **Accuracy** = (TP + TN) / (P+N), where P and N are total positive and negative classes.

In the given example, Accuracy is (2+7)/(3+7)= .90

**The Result** – Our model has a .90 accuracy value so it means that 90% of the time, the prediction is correct.

## Sensitivity

It tells *What proportion of **actual** **positive classes** was *** predicted correctly**. This is the same as the

**Recall**value.

**Formula: Sensitivity = TP / (TP + FN)**

In the given example, Sensitivity is: 2 / (2 + 1) = 0.667

*The*** Result** – Our model has a sensitivity of .667 so it means that it identified 66.7% of positive classes correctly.

## Specificity

It tells *What proportion of **actual** **negative classes** was *** predicted correctly**.

**Formula: Specificity = TN / (TN + FP)**

In the given example, Specificity is: 7 / (7 + 0) = 1

**The Result** – Our model has a Specificity of 1 so it means that it identified 100% of negative classes correctly.

## F1 Score

It tells that both precision and recall are balanced. Neither is too high or nor is too low. It’s the best value is 1 when both are balanced and 0 if any of them is very high.

**Formula: F1 Score = 2 x ((Precision*Recall) / (Precision + Recall))**

In the given example, F1 Score is: 2 x (1 * .667) /(1+.667) = .800

*The*** result** – Our model has an F1 Score of .80 which is closed to best value 1 so precision and recall are in balance.

## Conclusion

I used a very simple example so you can understand easily all of these metrics. All formulas are easy but mess with each other when we do an implementation. Ok, Next is which metrics we should consider to decide if a model is best or not. I’ll write a dedicated post for this in the next few days.