Skip to main content

Machine Learning-Decision Trees and Random Forests


Another post starts with you beautiful people!
I hope after reading my previous post about Linear and Logistic Regression your confidence level is up and you are now ready to move one step ahead in Machine Learning arena.
In this post we will be going over Decision Trees and Random Forests.
In order for you to understand this exercise completely there is some required reading.
I suggests you to please read following blog post before going further-A Must Read!
After reading the blog post you should have a basic layman's (or laywoman!) understanding of how decision trees and random forests work. A quick intro is as below-

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
For instance, in the example below, decision trees learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model.
A Random Forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

Lets see how we can implement them in Python!
Python provides a powerful library-sklearn for decision trees and random forests, we will use the same in our exercise. You can find more details about this here- Decision Tree with Python and Random Forest with Python


Creating Decision Trees:-
We know from the A Must Read! blog posts that Decision Trees utilize binary splitting to make decisions based on features (the questions we asked).
So lets go ahead and create some data using some built-in functions in SciKit-Learn:

Please note here-
make_blobs is used to Generate isotropic Gaussian blobs for clustering.
n_samples : int, optional (default=100),The total number of points equally divided among clusters.
centers : int or array of shape [n_centers, n_features], optional (default=3) The number of centers to generate, or the fixed center locations.
cluster_std : float or sequence of floats, optional (default=1.0) The standard deviation of the clusters.
random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.

Output:-

Visualization Function:-
Before we begin implementing the Decision Tree, lets create a nice function to plot out the decision boundaries using mesh grid (a technique common to the Sci-Kit Learn documentation)-

  • Return coordinate matrices from coordinate vectors.
  • Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,..., xn


If you need above function code, let me know in comment section.
I will share the same!

Let's plot out a Decision Tree boundary with a max depth of two branches:-
Output-
How about 4 levels deep?

Notice how changing the depth of the decision causes the boundaries to change substantially! 
If we pay close attention to the second model we can begin to see evidence of over-fitting. 
This basically means that if we were to try to predict a new point the result would be influenced more by the noise than the signal.

So how do we address this issue? 
The answer is by creating an ensemble of decision trees-Random Forests.

Random Forests:-
Ensemble Methods essentially average the results of many individual estimators which over-fit the data. The resulting estimates are much more robust and accurate than the individual estimates which make them up!
One of the most common ensemble methods is the Random Forest, in which the ensemble is made up of many decision trees which are in some way perturbed.
Lets see how we can use Sci-Kit Learn to create a random forest (its actually very simple!)


Note that n_estimators stands for the numerb of trees to use. We would intuitively know that using more decision trees would be better, but after a certain amount of trees (somewhere between 100-400 depending on our data) the benefits in accuracy of adding more estimators significantly decreases and just becomes a load on your CPU.


We can see that the random forest has been able to pick up features that the Decision Tree was not able to (although we must be careful of over-fitting with Random Forests too!)
While a visual is nice, a better way to evaluate our model would be with train test split if we had real data!

Random Forest Regression:-
We can also use Random Forests for Regression! Let's see a quick example!

Let's imagine we have some sort of weather data that's sinusoidal in nature with some noise. It has a slow oscillation component, a fast oscillation component, and then a random noise component.


Now lets use a Random Forest Regressor to create a fitted regression, obviously a standard linear regression approach wouldn't work here. And if we didn't know anything about the true nature of the model, polynomial or sinusoidal regression would be tedious.

Output-

As you can see, the non-parametric random forest model is flexible enough to fit the multi-period data, without us even specifying a multi-period model!
This is a tradeoff between simplicity and thinking about what your data actually is.

Here are some more resources for Random Forests:
A whole webpage form the inventors themselves-Leo Breiman and Adele Cutler Random Forests
Its strange to think Random Forests is actually trademarked!

That's it for today! Please try above function with some modification in your notebook and explore more.

Comments

  1. It is nice blog Thank you porovide important information and i am searching for same information to save my time Data Science online Course Hyderabad

    ReplyDelete
  2. Great thanks for sharing about Machine learning decision tree. This post will be helpful for the readers who are searching for this type of information. Keep it up
    machine learning with python course in chennai | best training institute for machine learning

    ReplyDelete
  3. Decision trees are a great flow chart tree structuecire.Yet decision trees are less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute.
    To understand further more lets look at some Decision Tree Examples in the Creately diagram community.

    ReplyDelete
  4. Good information you shared. keep posting.
    <a href="https://360digitmg.com/india/artificial-intelligence-ai-and-deep-learning-in-noida>artificial intelligence course in noida</a>

    ReplyDelete
  5. Thanks for sharing the best information and suggestions, I love your content, and they are very nice and very useful to us. If you are looking for the best Machine Learning Model Implementation USA, then visit Symentix Technologies Private Limited. I appreciate the work you have put into this.

    ReplyDelete
  6. At APTRON Solutions, we believe in learning by doing. Our Machine Learning Training in Noida emphasizes hands-on experience through practical exercises, real-world projects, and case studies. This approach ensures you gain practical skills directly applicable to the industry.

    ReplyDelete

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...