Skip to main content

Machine Learning-Cross Validation & ROC curve


Another post starts with you beautiful people!
Hope you enjoyed my previous post about improving your model performance by confusion metrix.
Today we will continue our performance improvement journey and will learn about Cross Validation (k-fold cross validation) & ROC in Machine Learning.

A common practice in data science competitions is to iterate over various models to find a better performing model. However, it becomes difficult to distinguish whether this improvement in score is coming because we are capturing the relationship better or we are just over-fitting the data. To find the right answer of this question, we use cross validation technique. This method helps us to achieve more generalized relationships.

What is Cross Validation?
Cross Validation is a technique which involves reserving a particular sample of a data set on which we do not train the model. Later, we test the model on this sample before finalizing the model.
Here are the steps involved in cross validation:

  • We reserve a sample data set.
  • Train the model using the remaining part of the data set.
  • Use the reserve sample of the data set test (validation) set. This will help us to know the effectiveness of model performance. It our model delivers a positive result on validation data, go ahead with current model.
The k-fold cross validation method of cross validation technique takes care of below three requirements-

  • We should train model on large portion of data set. Else, we’d fail every time to read the underlying trend of data sets. Eventually, resulting in higher bias.
  • We also need a good ratio testing data points. As, we have seen that lower data points can lead to variance error while testing the effectiveness of model.
  • We should iterate on training and testing process multiple times. We should change the train and test data set distribution. This helps to validate the model effectiveness well.
Here are the steps to implement k-fold validation method:-
  • Randomly split our entire dataset into k”folds”.
  • For each k folds in our dataset, build our model on k – 1 folds of the data set. Then, test the model to check the effectiveness for kth fold.
  • Record the error we see on each of the predictions.
  • Repeat this until each of the k folds has served as the test set.
  • The average of our k recorded errors is called the cross-validation error and will serve as our performance metric for the model.
In this exercise will train our model with the same dataset and continue our step after random forest step as we did in last post of Confusion Metrix. Please revise those steps from previous post.

Comparing above result with Random forest:-


We can see that the accuracy has been increased when performed Cross-Validation in random forest classifier as well as for logistic regression.

Now train the model on whole data and predict the future data points:-


From above results it is quite clear that-
The accuracy scores for
        Random Forest on train/test split : 75                   
        Logistic Regression on train/test split: 75.5  
        Random Forest on Cross Validation : 77.09         
        Logistic Regression on Cross Validation : 76.8

So with cross-validataion there is high probability of increasing model accuracy.

Adjusting the classification threshold:-


From the above graph we find following result for our dataset:-
  • Decrease the threshold for predicting diabetes in order to increase the sensitivity of the classifier
  • Threshold of 0.5 is used by default (for binary problems) to convert predicted probabilities into class predictions.
  • Threshold can be adjusted to increase sensitivity or specificity Sensitivity and specificity have an inverse relationship.
Wouldn't it be nice if we could see how sensitivity and specificity are affected by various thresholds, without actually changing the threshold?
Yes, we can and answer is by plotting ROC curve.
For more details about this curve please visit here- what is ROC?
  • ROC curve tries to evaluate how well the model has achieved the seperation between the classes at all threshold values.
  • ROC curve can help us to choose a threshold that balances sensitivity and specificity in a way that makes sense for our particular context.

Result:-

Define a function that accepts a threshold and prints sensitivity and specificity:-


Conclusion of this exercise:
In this way business can understand where should the threshold be set so as to maximize Sensitivity or Specificity.

In my next post we will learn about Principal component analysis or PCA.

Comments

  1. Nice post ! Thanks for sharing valuable information with us. Keep sharing Data Science online Course

    ReplyDelete
    Replies
    1. Very Impressive ROC Curve Data Science tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning ROC Curve Data Science course. I'm also a learner taken up ROC Curve Data Science training and I think your content has cleared some concepts of mine. While browsing for ROC Curve Data Science tutorials on YouTube i found this fantastic video on ROC Curve Data Science.

      Machine Learning Final Year Projects

      Delete
  2. Very Impressive ROC Curve Data Science tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning ROC Curve Data Science course. I'm also a learner taken up ROC Curve Data Science training and I think your content has cleared some concepts of mine. While browsing for ROC Curve Data Science tutorials on YouTube i found this fantastic video on ROC Curve Data Science. Do check it out if you are interested to know more.:-https://www.youtube.com/watch?v=G_pvQYUm8Ik

    ReplyDelete
  3. Thank you for sharing such great information very useful to us.
    Python Training in Noida

    ReplyDelete
  4. Here is the investors contact Email details,_   lfdsloans@lemeridianfds.com  Or Whatsapp +1 989-394-3740 that helped me with loan of 90,000.00 Euros to startup my business and I'm very grateful,It was really hard on me here trying to make a way as a single mother things hasn't be easy with me but with the help of Le_Meridian put smile on my face as i watch my business growing stronger and expanding as well.I know you may surprise why me putting things like this here but i really have to express my gratitude so anyone seeking for financial help or going through hardship with there business or want to startup business project can see to this and have hope of getting out of the hardship..Thank You.

    ReplyDelete
  5. This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.python training in bangalore

    ReplyDelete
  6. This video helps me to understand Matplotlib whats your opinion guys.

    ReplyDelete
  7. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.
    Register for a free Demo Sessions

    RPA Ui Path Online Training
    Best Python Online Training
    Online AWS Training
    Online Data Science Training

    ReplyDelete
  8. Great blog. All posts have something to learn. Your work is very good and I appreciate you and hopping for some more informative posts. Chat with Amateur Models

    ReplyDelete
  9. Through meticulous animation, Product Animation Services offer immersive presentations, allowing viewers to explore products from various angles and perspectives.

    ReplyDelete

Post a Comment

Popular posts from this blog

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

How to install and compile YOLO v4 with GPU enable settings in Windows 10?

Another post starts with you beautiful people! Last year I had shared a post about  installing and compiling Darknet YOLOv3   in your Windows machine and also how to detect an object using  YOLOv3 with Keras . This year on April' 2020 the fourth generation of YOLO has arrived and since then I was curious to use this as soon as possible. Due to my project (built on YOLOv3 :)) work I could not find a chance to check this latest release. Today I got some relief and successfully able to install and compile YOLOv4 in my machine. In this post I am going to share a single shot way to do the same in your Windows 10 machine. If your machine does not have GPU then you can follow my  previous post  by just replacing YOLOv3 related files with YOLOv4 files. For GPU having Windows machine, follow my steps to avoid any issue while building the Darknet repository. My machine has following configurations: Windows 10 64 bit Intel Core i7 16 GB RAM NVIDIA GeForce G...

Can you build a model to predict toxic comments?

Another post starts with you beautiful people! Hope you have learnt something new and very powerful machine learning model from my previous post-  How to use LightGBM? Till now you must have an idea that there is no any area left that a machine learning model cannot be applied; yes it's everywhere! Continuing our journey today we will learn how to deal a problem which consists texts/sentences as feature. Examples of such kind of problems you see in internet sites, emails, posts , social media etc. Data Scientists sitting in industry giants like Quora, Twitter, Facebook, Google are working very smartly to build machine learning models to classify texts/sentences/words. Today we are going to do the same and believe me friends once you do some hand on, you will be also in the same hat. Challenge Link :  jigsaw-toxic-comment-classification-challenge Problem : We’re challenged to build a multi-headed model that’s capable of detecting different types of toxicity like thre...