Skip to main content

How to achieve maximum parallel processing capabilities with XGBoost-1.0.0?


Another post starts with you beautiful people!
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. Recently XGBoost is released with it's newer version 1.0.0 which has improvements like performance scaling for multi core CPUs, improved installation experience on Mac OSX, availability of distributed XGBoost on Kubernates etc. In this post we are going to explore it's multi processing capabilities on a real world ml problem Otto Group Product Classification Challenge. In the end of the post I will share my kaggle kernel link also so that you can explore my complete code.

Once you go to the challenge link in Kaggle and start your kernel, first you need to enable the Internet option in the notebook since current version of XGBoost installed in kernel notebook is 0.90. So for upgrading it run the following command in your anaconda prompt-
pip install --upgrade xgboost or !pip install --upgrade xgboost in your kaggle kernel. Screenshot of Anaconda prompt is as below-
And screen shot of my kernel is as below-
After running above command you will see following screen showing successful installation of the library-

Let's import the dataset-
This dataset describes the 95 details of 61,878 products grouped into 10 product categories (e.g. fashion, electronics, etc.). Input attributes are counts of different events of some kind. The goal is to make predictions for new products as an array of probabilities for each of the 10 categories and models are evaluated using multi-class logarithmic loss (also called cross entropy).

Please note that we get multi-threading support by default with XGBoost. But depending on our Python environment (e.g. Python 3) we may need to explicitly enable multi-threading support for XGBoost. We can confirm that XGBoost multi-threading support is working fine by building a
number of different XGBoost models, specifying the number of threads and timing how long it takes to build each model. The trend will both show us that multi-threading support is enabled and give us an indication of the effect it has when building models. Below is the code snippet showing how can you check this-

You can update the number of threads list based on your system configuration. Recommended way to set number of threads (nthread) is it should be equal to the number of physical CPU cores in your machine. After running the above code cell I am getting following result-

Also we can plot the above trend in following way-
From above plot we can see a nice trend in the decrease in execution time as the number of threads is increased. You can run the same code on a machine with a lot more cores and decrease the model training time. Now you know how to configure the number of threads with XGBoost in your machine. But there is one more important thing we can do as tuning. We always do cross validation to avoid overfiting in our model and this steps also time consuming. So is there any way to tune this process. Answer is absolutely yes! We can enable the multi-threading in both XGBoost as well as in cross validation.

The k-fold cross-validation also supports multi-threading. For example the n_jobs argument on the cross val score() function allows us to specify the number of parallel jobs to run. By default, this is set to 1, but can be set to -1 to use all of the CPU cores on our system. In following code snippet we will check three configurations with cross validation and XGBoost to achieve multi-threading and then compare the output of all three configurations-

In above code snippet you can see only configurable parameter required for multi-threading is nthread in XGBoost and is n_jobs in cross validation. For example only I am using 1 thread but you can change it to your no. of cores. After running above cell the best result is achieved by enabling multi-threading within XGBoost and not in cross-validation as you can see in below screen shot also-

So if you are going to to do cross validation with any library other than XGBoost 1.0.0, don't forget to enable the multi-treading feature in cross validation and if you are going to use XGBoost which you should be, don't forget to check no. of threads and enabling multi-threading within XGBoost. For the complete solution, you can find my kernel in following link: my kernel
Fork my kernel and start experimenting with it and if you like to learn more about XGBoost, follow the tutorials of this amazing guy: Jason Brownlee PhD.Till then Go chase your dreams, have an awesome day, make every second count and see you later in my next post.


Comments

  1. Okay then...

    What I'm going to tell you might sound pretty weird, and maybe even kind of "strange"

    HOW would you like it if you could just press "PLAY" to LISTEN to a short, "miracle tone"...

    And miraculously attract MORE MONEY into your LIFE??

    And I'm really talking about hundreds... even thousands of dollars!!

    Sound too EASY?? Think something like this is not for real?!?

    Well then, Let me tell you the news..

    Usually the largest blessings in life are also the EASIEST!!

    Honestly, I will PROVE it to you by letting you listen to a REAL "magical money tone" I've produced...

    You just click "PLAY" and watch money coming right into your life... starting pretty much right away...

    GO here now to experience the marvelous "Miracle Money Tone" - as my gift to you!!

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. I have been researching on this concept now i find useful information. thanks for giving clear info. looking forward for more posts from this author
    data science course chennai

    ReplyDelete
  4. This video helps me to understand Matplotlib whats your opinion guys.

    ReplyDelete

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...