Skip to main content

Machine Learning:Naive Bayes Classifier


Another post starts with you beautiful people!
Continuing our Machine Learning track today we will apply the Naive Bayes Classifier but before that we need to understand the Bayes Theorem. So let’s first understand the Bayes Theorem.

Bayes Theorem works on conditional probability. Conditional probability is the probability that something will happen, given that something else has already occurred. Using the conditional probability, we can calculate the probability of an event using its prior knowledge.
Below is the formula for calculating the conditional probability.
where
P(H) is the probability of hypothesis H being true. This is known as the prior probability.
P(E) is the probability of the evidence(regardless of the hypothesis).
P(E|H) is the probability of the evidence given that hypothesis is true.
P(H|E) is the probability of the hypothesis given that the evidence is there.

We can understand the above concept with a classic example of coin that I summarized as below picture-


Now understand the Naive Bayes Classifier in the following easiest way-

So you must be thinking in real world where we can apply this algo to solve a problem?
The answer is Email Classification ! To filter the Spam vs Ham.
Sound interesting right! let's start hands on to solve this email classification problem and build our model. Our goal is to train a Naive Bayes model to classify future SMS messages as either spam or ham.
We will follow below steps to achieve our goal-

  1. Convert the words ham and spam to a binary indicator variable(0/1)
  2. Convert the txt to a sparse matrix of TFIDF vectors
  3. Fit a Naive Bayes Classifier
  4. Measure your success using roc_auc_score
Importing required libraries-


I request you to please go through official document [sklearn.naive_bayes] of each library and read once.

Load our spam dataset-
Train the classifier if it is spam or ham based on the text:-

Convert the spam and ham to 1 and 0 values respectively for probability testing:-

Do some cleaning:-

Split the data into test and train:-


Check for null values in spam:-

Let's predict our model:-

Check our model accuracy:-

Looks great! with this model the success rate is 98.61%.
I hope with this real world example you can understand how easy is to apply Naive Bayes Classifier.

Meanwhile Friends! Go chase your dreams, have an awesome day, make every second count and see you later in my next post.

Comments

Popular posts from this blog

Can you build a model to predict toxic comments?

Another post starts with you beautiful people! Hope you have learnt something new and very powerful machine learning model from my previous post-  How to use LightGBM? Till now you must have an idea that there is no any area left that a machine learning model cannot be applied; yes it's everywhere! Continuing our journey today we will learn how to deal a problem which consists texts/sentences as feature. Examples of such kind of problems you see in internet sites, emails, posts , social media etc. Data Scientists sitting in industry giants like Quora, Twitter, Facebook, Google are working very smartly to build machine learning models to classify texts/sentences/words. Today we are going to do the same and believe me friends once you do some hand on, you will be also in the same hat. Challenge Link :  jigsaw-toxic-comment-classification-challenge Problem : We’re challenged to build a multi-headed model that’s capable of detecting different types of toxicity like thre...

My solution to HackerEarth's Identify the dance form challenge

Another post starts with you beautiful people! Today an interesting deep learning challenge is finished in  HackerEarth  and I got 91.17026 mAP score in the leader board. One drawback I see in HackerEarth is due to small dataset many participants manually prepare the submission files and show 100% score in the leader board. Many aspiring data scientists see this and become nervous. Even with getting score 75+, they become demotivated and leave their experiments in between the challenge. Also the winning approach is not disclosed after the challenge. With this post I will try to motivate my all aspiring data scientists and I will share my solution so that in their next challenge they can easily get 85+ score or even 92+ score :) Problem statement An event management company organized an evening of Indian classical dance performances to celebrate the rich, eloquent, and elegant art of dance. After the event, the company plans to create a micro site to promote and raise aw...

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...