Understand Confusion Matrix Using Real-life Classification Example

First I’d like to explain what is the use of a confusion matrix. Classification Problems are solved using Supervised Machine learning algorithms. In these problems, our goal is to categories an object using its features. For e.g, Identify a fruit using its taste, color and size or check out if a patient has a disease or not using symptoms. Building a model is not a one time deal, we have to do many experiments and record the output and check the performance of the model on each experiment.

Confusion Matrix

So the Confusion Matrix is the technique we use to measure the performance of classification models. This post is dedicated to explaining the confusion matrix using real-life examples and In the end, you’ll be able to construct a confusion matrix and evaluate the performance model.

The Confusion Matrix is in a tabular form where each row represents actual classes and columns are predicated classes. As the name suggests, it’s really confusing for beginners to understand it. We create a table where each column has a special meaning and tells the number of correct or incorrect predictions with respect to actual values.

		PREDICATED CLASS
		POSITIVE	NEGATIVE
ACTUAL CLASS	POSITIVE	TRUE POSITIVE	FALSE NEGATIVE
	NEGATIVE	FALSE POSITIVE	TRUE NEGATIVE

Keep in mind, This is very important that you understand the above 4 terms otherwise you won’t be able to go further in evaluating the process.

Understand Confusion Matrix

Problem Statement – Check the accuracy of a model which predict if a user is coronavirus infected or not using symptoms

Experiment 1:

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.

Prediction – Our model observes the symptoms and predicted that 2 users are infected and 8 users are not infected.

In our problem, Infected users are labeled as Positive Class and non-infected users are labeled as Negative Class.

TRUE POSITIVE

It’s a correct classification. It tells how many positive classes are correctly classified.

Calculation – Lab Result reported that 3 users are infected and our model says 2 users are infected. So TRUE POSITIVE is 2

The result – 2 out of 3 infected users are correctly classified.

FALSE NEGATIVE

It’s an incorrect classification. It tells how many positive classes are incorrectly classified.

Calculation – Labs say 3 users are infected and our model says 2 users are infected so 1 infected user is incorrectly classified. So FALSE NEGATIVE is 1

The Result – 1 out of 3 infected users are incorrectly classified.

Please note the difference between TRUE POSITIVE and FALSE NEGATIVE.

FALSE POSITIVE

It’s an incorrect classification. It tells how many negative classes are incorrectly classified.

Calculation – There is no wrong prediction about negative classes. All 7 negative classes are correctly classified. FALSE POSITIVE is 0

The Result – 0 out of 7 non-infected users are incorrectly classified.

TRUE NEGATIVE

It’s a correct classification. It tells how many negative classes are correctly classified.

Calculation – Labs say 7 users are not infected and our model says 8 users are not infected so all non-infected users are correctly classified. So TRUE NEGATIVE is 7

The Result – 7 out of 7 non-infected users are correctly classified.

Trick – here first word (TRUE or NEGATIVE) donates if model predicted correctly or not and the second word (POSITIVE or NEGATIVE) is predicated class.

This all is confusing. Right? Let’s do two more examples so everything will be clear.

Experiment 2:

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.

Prediction – Our model observes the symptoms and predicts that 4 users are infected and 6 users are not infected.

		PREDICATED CLASS
		POSITIVE	NEGATIVE
ACTUAL CLASS	POSITIVE	3	0
	NEGATIVE	1	6

Experiment 3:

Actual – 10 users were suspicious of coronavirus. After laboratory reports, 3 users are found infected and 7 users are not infected.

Prediction – Our model observes the symptoms and predicts that 8 users are infected and 2 users are not infected.

		PREDICATED CLASS
		POSITIVE	NEGATIVE
ACTUAL CLASS	POSITIVE	3	0
	NEGATIVE	5	2

Conclusion

The first step to evaluating a model is to construct the confusion matrix. This confusing matrix measures the performance of your model and the goal to keep TRUE POSITIVE and TRUE NEGATIVE high and FALSE NEGATIVE and FALSE POSITIVE low.

Term	Result	Meaning
TRUE POSITIVE(TP)	Correct Classification	Positive class Identified as Positive
FALSE POSITIVE(FP)	Incorrect Classification	Positive class Identified as Negative
TRUE NEGATIVE(TN)	Correct Classification	Negative class Identified as Negative
FALSE NEGATIVE(FN)	Incorrect Classification	Negative class Identified as Positive