First I’d like to explain what is the use of a  confusion matrix. Classification Problems are solved using Supervised Machine learning algorithms. In these problems, our goal is to categories an object using its features. For e.g, Identify a fruit using its taste, color and size or check out if a patient has a disease or not using symptoms. Building a model is not a one time deal, we have to do many experiments and record the output and check the performance of the model on each experiment. 

Confusion Matrix

So the Confusion Matrix is the technique we use to measure the performance of classification models. This post is dedicated to explaining the confusion matrix using real-life examples and In the end, you’ll be able to construct a confusion matrix and evaluate the performance model. 

The Confusion Matrix is in a tabular form where each row represents actual classes and columns are predicated classes. As the name suggests, it’s really confusing for beginners to understand it. We create a table where each column has a special meaning and tells the number of correct or incorrect predictions with respect to actual values.



PREDICATED CLASS


POSITIVENEGATIVE
ACTUAL CLASSPOSITIVETRUE POSITIVEFALSE NEGATIVE

NEGATIVEFALSE POSITIVETRUE NEGATIVE

Keep in mind, This is very important that you understand the above 4 terms otherwise you won’t be able to go further in evaluating the process. 

Understand Confusion Matrix

Problem Statement – Check the accuracy of a model which predict if a user is coronavirus infected or not using symptoms

Experiment 1: 

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected. 


Prediction – Our model observes the symptoms and predicted that 2 users are infected and 8 users are not infected.

In our problem, Infected users are labeled as Positive Class and non-infected users are labeled as Negative Class.

TRUE POSITIVE

It’s a correct classification. It tells how many positive classes are correctly classified. 

Calculation – Lab Result reported that 3 users are infected and our model says 2 users are infected. So TRUE POSITIVE is 2

The result – 2 out of 3 infected users are correctly classified. 

FALSE NEGATIVE

It’s an incorrect classification.  It tells how many positive classes are incorrectly classified. 

Calculation – Labs say 3 users are infected and our model says 2 users are infected so 1 infected user is incorrectly classified. So FALSE NEGATIVE is 1

The Result – 1 out of 3 infected users are incorrectly classified. 

Please note the difference between TRUE POSITIVE and FALSE NEGATIVE.

FALSE POSITIVE

It’s an incorrect classification.  It tells how many negative classes are incorrectly classified. 

Calculation – There is no wrong prediction about negative classes. All 7 negative classes are correctly classified. FALSE POSITIVE is 0

The Result – 0 out of 7 non-infected users are incorrectly classified. 

TRUE NEGATIVE

It’s a correct classification. It tells how many negative classes are correctly classified. 

Calculation – Labs say 7 users are not infected and our model says 8 users are not infected so all non-infected users are correctly classified. So TRUE NEGATIVE is 7

The Result – 7 out of 7 non-infected users are correctly classified. 

Trick – here first word (TRUE or NEGATIVE) donates if model predicted correctly or not and the second word (POSITIVE or NEGATIVE) is predicated class. 

This all is confusing. Right? Let’s do two more examples so everything will be clear. 

Experiment 2: 

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected. 

Prediction – Our model observes the symptoms and predicts that 4 users are infected and 6 users are not infected.



PREDICATED CLASS


POSITIVENEGATIVE
ACTUAL CLASSPOSITIVE30

NEGATIVE16

Experiment 3: 

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected. 

Prediction – Our model observes the symptoms and predicts that 8 users are infected and 2 users are not infected.



PREDICATED CLASS


POSITIVENEGATIVE
ACTUAL CLASSPOSITIVE30

NEGATIVE52

Conclusion

The first step to evaluating a model is to construct the confusion matrix. This confusing matrix measures the performance of your model and the goal to keep TRUE POSITIVE and TRUE NEGATIVE high and FALSE NEGATIVE and FALSE POSITIVE low. 

TermResultMeaning
TRUE POSITIVE(TP)Correct ClassificationPositive class Identified as Positive
FALSE POSITIVE(FP)Incorrect ClassificationPositive class Identified as Negative
TRUE NEGATIVE(TN)Correct ClassificationNegative class Identified as Negative
FALSE NEGATIVE(FN)Incorrect ClassificationNegative class Identified as Positive