Skip to main content

Machine Learning::Confusion Matrix


Another post starts with you beautiful people!
Thanks for your overwhelming response on my previous post about decision trees and random forests.
Today in this post we will continue our Machine Learning journey and we will discover the confusion matrix interpretation for use in machine learning.
After reading this post we will know:




  • What the confusion matrix is and why we need to use it?
  • How to calculate a confusion matrix?
  • How to create a confusion matrix?
A confusion matrix is a technique for summarizing the performance of a classification algorithm.
Classification accuracy (Classification accuracy is the ratio of correct predictions to total predictions made) alone can be misleading if we have an unequal number of observations in each class or if we have more than two classes in our dataset.
For a quick revision remember the following formula -
error rate = (1 - (correct predictions / total predictions)) * 100
The main problem with classification accuracy is that it hides the detail we need to better understand the performance of our classification model.

There are two examples where we are most likely to encounter this problem:

  • When our data has more than 2 classes. With 3 or more classes we may get a classification accuracy of 80%, but we don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model.
  • When our data does not have an even number of classes. We may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and we can achieve this score by always predicting the most common class value.

But thankfully we can tease apart this detail by using a confusion matrix. Calculating a confusion matrix can give us a better idea of what our classification model is getting right and what types of errors it is making. A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix.

Below is the process for calculating a confusion Matrix-
1. We need a test dataset or a validation dataset with expected outcome values.
2. Make a prediction for each row in our test dataset.
3. From the expected outcomes and predictions, count-
  • The number of correct predictions for each class.
  • The number of incorrect predictions for each class, organized by the class that was predicted.

4. These numbers are then organized into a table, or a matrix as follows:

  • Expected down the side: Each row of the matrix corresponds to an actual class.
  • Predicted across the top: Each column of the matrix corresponds to a predicted class.

5. The counts of correct and incorrect classification are then filled into the table.
6. The total number of correct predictions for a class go into the expected row for that class value and     the predicted column for that class value.
7. In the same way, the total number of incorrect predictions for a class go into the expected row for       that class value and the predicted column for that class value.

Let's start with some hands on a dataset which has the information of diabetic patients and we need to predict whether a patient has diabetes or not.
Attribute Information:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)

9. Class variable (0 or 1)


Checking the first 5 rows of data:-pima.head()

Checking the type of each attribute:-pima.info()

Spreading of negative cases and positive cases:-pima.groupby('label')['skin'].count()

Define  X and y (where X are dependent attributes and Y is independent attribute):_

Split X and y into training and testing sets:-

Train a logistic regression model on the training set:-

Output-

Make class predictions for the testing set:-

Calculate accuracy:-

Lets calculate null accuracy (accuracy that could be achieved by always predicting the most frequent class) and see how good is our model compared to base model.

Examine the class distribution of the testing set:-
Calculate the percentage of ones:-
Calculate the percentage of zeros:-
Calculate null accuracy (for binary classification problems coded as 0/1):-

So the null accuracy score is 0.677 and our model accuracy is little better than null accuracy.

Let's Plot Confusion Matrix:-

Output:-

Let's apply random forest:-

Make class predictions for the testing set and check accuracy:-

Confusion matrix for random forest:-

Output:-

Basic terminology for confusion matrix:-

  • True Positives (TP): we correctly predicted that they do have diabetes
  • True Negatives (TN): we correctly predicted that they don't have diabetes
  • False Positives (FP): we incorrectly predicted that they do have diabetes (a "Type I error")
  • False Negatives (FN): we incorrectly predicted that they don't have diabetes (a "Type II error")
Let's understand the metrics in terms of business context:-
Suppose you are owner of ferrari company and you are manufacturing limited edition super car.
The head of marketing department has 10,000 customer details who they think to advertise.
You have created a model which predicts whether a customer will buy the car or not. 
According to the model you will advertise to only those which the model tells as buyers
So in this case your model can do two mistakes-
1) Precision: Predicts non-buyer as buyer this is false positive (falsely predicting that the customer will buy)
2) Recall : Predicts buyer as non-buyer this is false negative (falsely predicting that the customer will not buy)

Now which metric do you think is important?
For this case, If model predicts a non-buyer as buyer then company will loose small amount by advertising to non-buyer and the amount they spent on advertising for that person will be low (at most 50$)..this is precision (falsely predicted as positive).

But on the other side of coin, If model predicts a buyer as non-buyer then the company is not going to advertise the car to that buyer and at the end the company is going to loose that customer who had the potential to buy that car. This is recall (falsely predicted as negative)..
So in this case the recall is the metric to optimize.

What is F1- Score?
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. 
Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if we have an uneven class distribution. 
Accuracy works best if false positives and false negatives have similar cost. 
If the cost of false positives and false negatives are very different, it’s better to look at both Precision and Recall.
If we have a specific goal in our mind like 'Precision is the king. We don't care much about recall', then there's no problem. 
Higher precision is better. But if we don't have such a strong goal, we will want a combined metric. That's F-measure. By using it, we will compare some of precision and some of recall of different models.
F1-Score = 2 (Recall * Precision) / (Recall + Precision)
The closer to 1 the better.


Say We have a precision of 80% and a recall of 15%. if we create new model with different algorithm so the new model precision is 70% but the recall is 20%. 
The first case has F measure of 25.3%. The second is 31%. Even though our average goes down between the two, it is more important to increase our recall so the precision drop is worth it. 
The F-score allows us to judge just how much of a tradeoff is worthwhile. If we made our system have a 30% precision and 20% recall, our F-measure would be 24%, and the tradeoff wouldn't be worth it.

What is F-beta Score?
The beta parameter determines the weight of precision in the combined score. 
beta < 1 lends more weight to precision, while beta > 1 favors recall (beta -> 0 considers only precision, beta -> inf only recall).
If we are trying to decide between two different models where both has high precison but lower recall which will we choose? 
One method is to choose the model which has high area under ROC curve another method is to choose model with higher F-beta score.

Try different values to understand how change in beta value is effecting the output:-




That's it guy'z for today. Try above learning in a different dataset and explore more!






Comments

  1. Hi there! This post couldn't be written any better! Reading this post reminds me of
    my old room mate! He always kept talking about
    this. I will forward this page to him. Pretty sure he will have a good read.
    Many thanks for sharing!

    ReplyDelete
  2. APTRON offers a top-notch Machine Learning Course in Gurgaon that provides students with the necessary skills and knowledge to excel in the field. With a curriculum designed by industry experts and delivered by experienced instructors, our program offers a comprehensive understanding of machine learning concepts and their practical application.

    ReplyDelete

Post a Comment

Popular posts from this blog

How to install and compile YOLO v4 with GPU enable settings in Windows 10?

Another post starts with you beautiful people! Last year I had shared a post about  installing and compiling Darknet YOLOv3   in your Windows machine and also how to detect an object using  YOLOv3 with Keras . This year on April' 2020 the fourth generation of YOLO has arrived and since then I was curious to use this as soon as possible. Due to my project (built on YOLOv3 :)) work I could not find a chance to check this latest release. Today I got some relief and successfully able to install and compile YOLOv4 in my machine. In this post I am going to share a single shot way to do the same in your Windows 10 machine. If your machine does not have GPU then you can follow my  previous post  by just replacing YOLOv3 related files with YOLOv4 files. For GPU having Windows machine, follow my steps to avoid any issue while building the Darknet repository. My machine has following configurations: Windows 10 64 bit Intel Core i7 16 GB RAM NVIDIA GeForce GTX 1660 Ti Version 445.87

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-python --up

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau