Skip to main content

When & Where to use Linear or Logistic regression?

Another post starts with you beautiful people!
First of all thank you everyone for visiting my blog and showing your keen interest in Linear and Logistic Regression topics of Machine Learning track!
Since many of you have asked a common but most important question- How to know when and where apply either Linear or Logistic regression?
Therefore I am going to share this post where I will try to resolve your doubt.
Linear and Logistic regressions are usually the first algorithms people learn in predictive modeling.
Each form has its own importance and a specific condition where they are best suited to apply-

What is Regression Analysis?

  • Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). 
  • This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. 
  • For example, relationship between rash driving and number of road accidents by a driver is best studied through regression.
  • Regression analysis is an important tool for modelling and analyzing data. 
  • Here, we fit a curve / line to the data points, in such a manner that the differences between the distances of data points from the curve or line is minimized.

Why do we use Regression Analysis?
As mentioned above, regression analysis estimates the relationship between two or more variables. Let’s understand this with an easy example:
Let’s say, you want to estimate growth in sales of a company based on current economic conditions. 
You have the recent company data which indicates that the growth in sales is around two and a half times the growth in the economy. 
Using this insight, we can predict future sales of the company based on current & past information.

There are multiple benefits of using regression analysis. They are as follows:
  • It indicates the significant relationships between dependent variable and independent variable.
  • It indicates the strength of impact of multiple independent variables on a dependent variable.
  • Regression analysis also allows us to compare the effects of variables measured on different scales, such as the effect of price changes and the number of promotional activities. 
  • These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set of variables to be used for building predictive .
1. Linear Regression-
  • It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick while learning predictive modeling. 
  • In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear.
  • Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line).
  • It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term
  • This equation can be used to predict the value of target variable based on given predictor variable(s).
  • The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable.
How to obtain best fit line (Value of a and b)?
  • This task can be easily accomplished by Least Square Method. It is the most common method used for fitting a regression line. 
  • It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because the deviations are first squared, when added, there is no cancelling out between positive and negative values.
Important Points about linear regression:
  • There must be linear relationship between independent and dependent variables.
  • Multiple regression suffers from multicollinearity, autocorrelation, heteroskedasticity.
  • Linear Regression is very sensitive to Outliers. It can terribly affect the regression line and eventually the forecasted values.
  • Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. 
  • The result is that the coefficient estimates are unstable.
  • In case of multiple independent variables, we can go with forward selection, backward elimination and step wise approach for selection of most significant independent variables.
2. Logistic Regression-
  • Logistic regression is used to find the probability of event=Success and event=Failure. 
  • We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature.
  • Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is Logic Function
  • In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).
Important Points about logistic regression:
  • It is widely used for classification problems.
  • Logistic regression doesn’t require linear relationship between dependent and independent variables.  
  • It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio.
  • To avoid over fitting and under fitting, we should include all significant variables. 
  • A good approach to ensure this practice is to use a step wise method to estimate the logistic regression.
  • It requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square.
  • The independent variables should not be correlated with each other i.e. no multi collinearity.  However, we have the options to include interaction effects of categorical variables in the analysis and in the model.
  • If the values of dependent variable is ordinal, then it is called as Ordinal Logistic regression.
  • If dependent variable is multi class then it is known as Multinomial Logistic regression.
I hope after reading and understanding the above key points you will get it easily!
Take a print out of this post and hold it infront of your desk so that whenever you are going to work on predicting a model, you have no doubt at all to choose between linear and logistic regression.

Comments

  1. very well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Logistic Regression explained
    Correlation vs Covariance
    Simple Linear Regression

    ReplyDelete

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...