Machine Learning-Cross Validation & ROC curve

Another post starts with you beautiful people!
Hope you enjoyed my previous post about improving your model performance by confusion metrix.
Today we will continue our performance improvement journey and will learn about Cross Validation (k-fold cross validation) & ROC in Machine Learning.

A common practice in data science competitions is to iterate over various models to find a better performing model. However, it becomes difficult to distinguish whether this improvement in score is coming because we are capturing the relationship better or we are just over-fitting the data. To find the right answer of this question, we use cross validation technique. This method helps us to achieve more generalized relationships.

What is Cross Validation?
Cross Validation is a technique which involves reserving a particular sample of a data set on which we do not train the model. Later, we test the model on this sample before finalizing the model.
Here are the steps involved in cross validation:

We reserve a sample data set.
Train the model using the remaining part of the data set.
Use the reserve sample of the data set test (validation) set. This will help us to know the effectiveness of model performance. It our model delivers a positive result on validation data, go ahead with current model.

The k-fold cross validation method of cross validation technique takes care of below three requirements-

We should train model on large portion of data set. Else, we’d fail every time to read the underlying trend of data sets. Eventually, resulting in higher bias.
We also need a good ratio testing data points. As, we have seen that lower data points can lead to variance error while testing the effectiveness of model.
We should iterate on training and testing process multiple times. We should change the train and test data set distribution. This helps to validate the model effectiveness well.

Here are the steps to implement k-fold validation method:-

Randomly split our entire dataset into k”folds”.
For each k folds in our dataset, build our model on k – 1 folds of the data set. Then, test the model to check the effectiveness for kth fold.
Record the error we see on each of the predictions.
Repeat this until each of the k folds has served as the test set.
The average of our k recorded errors is called the cross-validation error and will serve as our performance metric for the model.

In this exercise will train our model with the same dataset and continue our step after random forest step as we did in last post of Confusion Metrix. Please revise those steps from previous post.

Comparing above result with Random forest:-

We can see that the accuracy has been increased when performed Cross-Validation in random forest classifier as well as for logistic regression.

Now train the model on whole data and predict the future data points:-

From above results it is quite clear that-

The accuracy scores for

Random Forest on train/test split : 75

Logistic Regression on train/test split: 75.5

Random Forest on Cross Validation : 77.09

Logistic Regression on Cross Validation : 76.8

So with cross-validataion there is high probability of increasing model accuracy.

Adjusting the classification threshold:-

From the above graph we find following result for our dataset:-

Decrease the threshold for predicting diabetes in order to increase the sensitivity of the classifier
Threshold of 0.5 is used by default (for binary problems) to convert predicted probabilities into class predictions.
Threshold can be adjusted to increase sensitivity or specificity Sensitivity and specificity have an inverse relationship.

Wouldn't it be nice if we could see how sensitivity and specificity are affected by various thresholds, without actually changing the threshold?

Yes, we can and answer is by plotting ROC curve.

For more details about this curve please visit here- what is ROC?

ROC curve tries to evaluate how well the model has achieved the seperation between the classes at all threshold values.
ROC curve can help us to choose a threshold that balances sensitivity and specificity in a way that makes sense for our particular context.

Result:-

Define a function that accepts a threshold and prints sensitivity and specificity:-

Conclusion of this exercise:

In this way business can understand where should the threshold be set so as to maximize Sensitivity or Specificity.

In my next post we will learn about Principal component analysis or PCA.

Comments

TejutejuAugust 8, 2018 at 4:48 PM
Nice post ! Thanks for sharing valuable information with us. Keep sharing Data Science online Course
ReplyDelete
Replies
UnknownNovember 21, 2018 at 9:25 AM
Very Impressive ROC Curve Data Science tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning ROC Curve Data Science course. I'm also a learner taken up ROC Curve Data Science training and I think your content has cleared some concepts of mine. While browsing for ROC Curve Data Science tutorials on YouTube i found this fantastic video on ROC Curve Data Science. Do check it out if you are interested to know more.:-https://www.youtube.com/watch?v=G_pvQYUm8Ik
ReplyDelete
Replies
Sadhana RathoreFebruary 4, 2019 at 12:07 PM
Thanks for sharing this information admin, it helps me to learn new things. Continue sharing more like this.
Python Classes in Chennai
Best Python Training in Chennai
ccna Training in Chennai
ccna institute in Chennai
R Programming Training in Chennai
Python Training in Anna Nagar
Python Training in Adyar
ReplyDelete
Replies
bhuviFebruary 5, 2019 at 5:01 PM
Nice post..

machine learning training in bangalore
ReplyDelete
Replies
Anbarasan14February 13, 2019 at 3:40 PM
Thanks to the admin for sharing this blog with us. The info in this blog was really helpful to me.
IELTS Coaching in Chennai
IELTS Coaching Centre in Chennai
IELTS Training in Chennai
Best IELTS Coaching in Chennai
IELTS Classes in Mumbai
IELTS Coaching in Mumbai
Best IELTS Coaching in Mumbai
IELTS Center in Mumbai
ReplyDelete
Replies
manishaMarch 1, 2019 at 4:53 PM
Thank you for sharing such great information very useful to us.
Python Training in Noida

ReplyDelete
Replies
nivedhithaMarch 9, 2019 at 12:04 PM
Great information Top data science institute in ameerpet
ReplyDelete
Replies
KarthikMarch 27, 2019 at 11:32 AM
Thanks for sharing your views about the concept which you know much better. Its easy to read and understand by the way you wrote the blog contents.
Data science training in porur
Data science training in Tambaram
Data science training in OMR
Data science training in chennai
Data science course in chennai
Cloud computing training in T Nagar
Cloud computing training in OMR
Cloud computing training in velachery
ReplyDelete
Replies
AnonymousSeptember 21, 2019 at 5:26 AM
Here is the investors contact Email details,_ lfdsloans@lemeridianfds.com Or Whatsapp +1 989-394-3740 that helped me with loan of 90,000.00 Euros to startup my business and I'm very grateful,It was really hard on me here trying to make a way as a single mother things hasn't be easy with me but with the help of Le_Meridian put smile on my face as i watch my business growing stronger and expanding as well.I know you may surprise why me putting things like this here but i really have to express my gratitude so anyone seeking for financial help or going through hardship with there business or want to startup business project can see to this and have hope of getting out of the hardship..Thank You.
ReplyDelete
Replies
Training for IT and Software CoursesNovember 28, 2019 at 5:30 PM
This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.python training in bangalore
ReplyDelete
Replies
Harish KumarMay 21, 2020 at 4:12 PM
This video helps me to understand Matplotlib whats your opinion guys.
ReplyDelete
Replies
AnonymousJune 16, 2020 at 2:24 PM
python training in bangalore | python online training
artificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training
aws training in Bangalore | aws online training

ReplyDelete
Replies
Abu samadAugust 14, 2020 at 6:11 PM
Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.
Register for a free Demo Sessions

RPA Ui Path Online Training
Best Python Online Training
Online AWS Training
Online Data Science Training
ReplyDelete
Replies
Micheal AlexanderSeptember 1, 2020 at 10:39 PM
Great blog. All posts have something to learn. Your work is very good and I appreciate you and hopping for some more informative posts. Chat with Amateur Models
ReplyDelete
Replies
AnonymousMay 2, 2022 at 5:14 AM
mmorpg oyunlar
instagram takipçi satın al
tiktok jeton hilesi
TİKTOK JETON HİLESİ
Saç ekim antalya
Referans kimliği nedir
İnstagram takipçi satın al
metin2 pvp serverlar
İnstagram takipci satın al
ReplyDelete
Replies
AnonymousMay 18, 2022 at 3:12 AM
Tül Perde Modelleri
sms onay
VODAFONE MOBİL ÖDEME BOZDURMA
NFTNASİLALİNİR.COM
Ankara evden eve nakliyat
trafik sigortası
dedektör
web sitesi kurma
aşk kitapları
ReplyDelete
Replies
AnonymousJune 3, 2022 at 8:09 AM
beykoz alarko carrier klima servisi
beykoz daikin klima servisi
üsküdar daikin klima servisi
pendik toshiba klima servisi
pendik beko klima servisi
tuzla lg klima servisi
tuzla alarko carrier klima servisi
tuzla daikin klima servisi
çekmeköy toshiba klima servisi
ReplyDelete
Replies
Digital Orbis CreatorsSeptember 16, 2022 at 12:26 PM
Wow! Nice Article.. Great author.. Keep posting
Digital Marketing Company in Coimbatore | SEO Company in Coimbatore | Best Digital Marketing Company in Coimbatore
ReplyDelete
Replies
vbetDecember 21, 2022 at 10:43 PM
Good content. You write beautiful things.
hacklink
mrbahis
sportsbet
vbet
mrbahis
korsan taksi
hacklink
sportsbet
taksi
ReplyDelete
Replies
hayriJune 28, 2023 at 8:34 AM
canlı sex hattı
heets
salt likit
salt likit
puff bar
AV0820
ReplyDelete
Replies
osmanJuly 7, 2023 at 10:22 PM
hatay
ığdır
iskenderun
ısparta
istanbul
1Rİ1U2
ReplyDelete
Replies
gamzeJuly 11, 2023 at 12:31 PM
çeşme
mardin
başakşehir
bitlis
edremit
PWNRNX
ReplyDelete
Replies
yusufJuly 22, 2023 at 4:12 AM
malatya
elazığ
kadıköy
istanbul
şişli

3ZV
ReplyDelete
Replies
ebruSeptember 6, 2023 at 9:59 AM
https://saglamproxy.com
metin2 proxy
proxy satın al
knight online proxy
mobil proxy satın al
CCUF
ReplyDelete
Replies
Aatif AnjumNovember 28, 2023 at 6:07 PM
Through meticulous animation, Product Animation Services offer immersive presentations, allowing viewers to explore products from various angles and perspectives.

ReplyDelete
Replies

Add comment

Learn Data Science using Python

Search This Blog

Machine Learning-Cross Validation & ROC curve

Labels

Comments

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

How can I make a simple ChatBot?

Case Study::Decision Trees & Random Forests::Machine Learning::Kaggle