Another post starts with you beautiful people!
Thanks for your overwhelming response on my previous post about decision trees and random forests.
Today in this post we will continue our Machine Learning journey and we will discover the confusion matrix interpretation for use in machine learning.
After reading this post we will know:
- What the confusion matrix is and why we need to use it?
- How to calculate a confusion matrix?
- How to create a confusion matrix?
Classification accuracy (Classification accuracy is the ratio of correct predictions to total predictions made) alone can be misleading if we have an unequal number of observations in each class or if we have more than two classes in our dataset.
For a quick revision remember the following formula -
error rate = (1 - (correct predictions / total predictions)) * 100
The main problem with classification accuracy is that it hides the detail we need to better understand the performance of our classification model.
There are two examples where we are most likely to encounter this problem:
- When our data has more than 2 classes. With 3 or more classes we may get a classification accuracy of 80%, but we don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model.
- When our data does not have an even number of classes. We may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and we can achieve this score by always predicting the most common class value.
But thankfully we can tease apart this detail by using a confusion matrix. Calculating a confusion matrix can give us a better idea of what our classification model is getting right and what types of errors it is making. A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix.
Below is the process for calculating a confusion Matrix-
1. We need a test dataset or a validation dataset with expected outcome values.
2. Make a prediction for each row in our test dataset.
3. From the expected outcomes and predictions, count-
2. Make a prediction for each row in our test dataset.
3. From the expected outcomes and predictions, count-
- The number of correct predictions for each class.
- The number of incorrect predictions for each class, organized by the class that was predicted.
4. These numbers are then organized into a table, or a matrix as follows:
- Expected down the side: Each row of the matrix corresponds to an actual class.
- Predicted across the top: Each column of the matrix corresponds to a predicted class.
5. The counts of correct and incorrect classification are then filled into the table.
6. The total number of correct predictions for a class go into the expected row for that class value and the predicted column for that class value.
7. In the same way, the total number of incorrect predictions for a class go into the expected row for that class value and the predicted column for that class value.
Let's start with some hands on a dataset which has the information of diabetic patients and we need to predict whether a patient has diabetes or not.
Attribute Information:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
Checking the first 5 rows of data:-pima.head()
Checking the type of each attribute:-pima.info()
Spreading of negative cases and positive cases:-pima.groupby('label')['skin'].count()
Split X and y into training and testing sets:-
Train a logistic regression model on the training set:-
Output-
Make class predictions for the testing set:-
Calculate accuracy:-
Lets calculate null accuracy (accuracy that could be achieved by always predicting the most frequent class) and see how good is our model compared to base model.
Examine the class distribution of the testing set:-
Calculate the percentage of ones:-
Calculate the percentage of zeros:-
Calculate null accuracy (for binary classification problems coded as 0/1):-
So the null accuracy score is 0.677 and our model accuracy is little better than null accuracy.
Let's Plot Confusion Matrix:-
Output:-
Let's apply random forest:-
Make class predictions for the testing set and check accuracy:-
Confusion matrix for random forest:-
Output:-
Basic terminology for confusion matrix:-
- True Positives (TP): we correctly predicted that they do have diabetes
- True Negatives (TN): we correctly predicted that they don't have diabetes
- False Positives (FP): we incorrectly predicted that they do have diabetes (a "Type I error")
- False Negatives (FN): we incorrectly predicted that they don't have diabetes (a "Type II error")
Let's understand the metrics in terms of business context:-
Suppose you are owner of ferrari company and you are manufacturing limited edition super car.
The head of marketing department has 10,000 customer details who they think to advertise.
You have created a model which predicts whether a customer will buy the car or not.
According to the model you will advertise to only those which the model tells as buyers
So in this case your model can do two mistakes-
1) Precision: Predicts non-buyer as buyer this is false positive (falsely predicting that the customer will buy)
2) Recall : Predicts buyer as non-buyer this is false negative (falsely predicting that the customer will not buy)
Now which metric do you think is important?
For this case, If model predicts a non-buyer as buyer then company will loose small amount by advertising to non-buyer and the amount they spent on advertising for that person will be low (at most 50$)..this is precision (falsely predicted as positive).
But on the other side of coin, If model predicts a buyer as non-buyer then the company is not going to advertise the car to that buyer and at the end the company is going to loose that customer who had the potential to buy that car. This is recall (falsely predicted as negative)..
So in this case the recall is the metric to optimize.
What is F1- Score?
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account.
Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if we have an uneven class distribution.
Accuracy works best if false positives and false negatives have similar cost.
If the cost of false positives and false negatives are very different, it’s better to look at both Precision and Recall.
If we have a specific goal in our mind like 'Precision is the king. We don't care much about recall', then there's no problem.
Higher precision is better. But if we don't have such a strong goal, we will want a combined metric. That's F-measure. By using it, we will compare some of precision and some of recall of different models.
F1-Score = 2 (Recall * Precision) / (Recall + Precision)
The closer to 1 the better.
Say We have a precision of 80% and a recall of 15%. if we create new model with different algorithm so the new model precision is 70% but the recall is 20%.
The first case has F measure of 25.3%. The second is 31%. Even though our average goes down between the two, it is more important to increase our recall so the precision drop is worth it.
The F-score allows us to judge just how much of a tradeoff is worthwhile. If we made our system have a 30% precision and 20% recall, our F-measure would be 24%, and the tradeoff wouldn't be worth it.
What is F-beta Score?
The beta parameter determines the weight of precision in the combined score.
beta < 1 lends more weight to precision, while beta > 1 favors recall (beta -> 0 considers only precision, beta -> inf only recall).
If we are trying to decide between two different models where both has high precison but lower recall which will we choose?
One method is to choose the model which has high area under ROC curve another method is to choose model with higher F-beta score.
Try different values to understand how change in beta value is effecting the output:-
That's it guy'z for today. Try above learning in a different dataset and explore more!
Hi there! This post couldn't be written any better! Reading this post reminds me of
ReplyDeletemy old room mate! He always kept talking about
this. I will forward this page to him. Pretty sure he will have a good read.
Many thanks for sharing!
APTRON offers a top-notch Machine Learning Course in Gurgaon that provides students with the necessary skills and knowledge to excel in the field. With a curriculum designed by industry experts and delivered by experienced instructors, our program offers a comprehensive understanding of machine learning concepts and their practical application.
ReplyDelete