Skip to main content

Machine Learning:Naive Bayes Classifier


Another post starts with you beautiful people!
Continuing our Machine Learning track today we will apply the Naive Bayes Classifier but before that we need to understand the Bayes Theorem. So let’s first understand the Bayes Theorem.

Bayes Theorem works on conditional probability. Conditional probability is the probability that something will happen, given that something else has already occurred. Using the conditional probability, we can calculate the probability of an event using its prior knowledge.
Below is the formula for calculating the conditional probability.
where
P(H) is the probability of hypothesis H being true. This is known as the prior probability.
P(E) is the probability of the evidence(regardless of the hypothesis).
P(E|H) is the probability of the evidence given that hypothesis is true.
P(H|E) is the probability of the hypothesis given that the evidence is there.

We can understand the above concept with a classic example of coin that I summarized as below picture-


Now understand the Naive Bayes Classifier in the following easiest way-

So you must be thinking in real world where we can apply this algo to solve a problem?
The answer is Email Classification ! To filter the Spam vs Ham.
Sound interesting right! let's start hands on to solve this email classification problem and build our model. Our goal is to train a Naive Bayes model to classify future SMS messages as either spam or ham.
We will follow below steps to achieve our goal-

  1. Convert the words ham and spam to a binary indicator variable(0/1)
  2. Convert the txt to a sparse matrix of TFIDF vectors
  3. Fit a Naive Bayes Classifier
  4. Measure your success using roc_auc_score
Importing required libraries-


I request you to please go through official document [sklearn.naive_bayes] of each library and read once.

Load our spam dataset-
Train the classifier if it is spam or ham based on the text:-

Convert the spam and ham to 1 and 0 values respectively for probability testing:-

Do some cleaning:-

Split the data into test and train:-


Check for null values in spam:-

Let's predict our model:-

Check our model accuracy:-

Looks great! with this model the success rate is 98.61%.
I hope with this real world example you can understand how easy is to apply Naive Bayes Classifier.

Meanwhile Friends! Go chase your dreams, have an awesome day, make every second count and see you later in my next post.

Comments

Popular posts from this blog

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

Detecting Credit Card Fraud As a Data Scientist

Another post starts with you beautiful people! Hope you have learnt something from my previous post about  machine learning classification real world problem Today we will continue our machine learning hands on journey and we will work on an interesting Credit Card Fraud Detection problem. The goal of this exercise is to anonymize credit card transactions labeled as fraudulent or genuine. For your own practice you can download the dataset from here-  Download the dataset! About the dataset:  The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. Let's start our analysis with loading the dataset first:- As per the  official documentation -  features V1, V2, ... V28 are the principal compo...