Skip to main content

Can you predict sales for a retail store?

Another post starts with you beautiful people!
Hope you have enjoyed my last post about kaggle submission and you also tried to build your own machine learning model.
To continue the same spirit today I will discuss about my model submission for the Wallmart Sales Forecasting where I got a score of 3077 (rank will be 196) in kaggle.

Challenge: In this challenge, we are provided with historical sales data for 45 Walmart stores located in different regions since 2010-02-05 to 2012-11-01. Each store contains a number of departments, and we are tasked with predicting the department-wide sales for each store.

My Sollution: To solve this machine learning regression problem I followed below steps-

1) load the datasets.

 While loading the datasets, I ensured that required attributes only should be there in train and feature datasets.

The important key feature which affects the sales of a store mostly is 'markdown' because this feature contains the information of festival seasons.
So it really important to do feature engineering of this attribute.

2) divide the markdown into groups

3) combine the test,train,feature datasets with markdown grouping
It's always a good approach to combine the datasets so that you get more features as a combined dataset and then you can split them accordingly your need.
Please note applying a model into a dataset is only a 20% of your task but the 60% of the task is taken by the initial steps of your machine learning model; that is EDA of the problem.
For every column of your dataset first visualize how each column is co-related to the target variable that is in our case is sales.

In this dataset there is a Date column and it means you are dealing with time series data and it need your attention to handle this data.
For handling the date with holiday we need to write some logic in such a way that our logic handle the markdown with respect to every store and it's sales.

4) handle the missing values
Always check how many missing values are present in each attribute of your dataset. For the time saving people just simply remove those. But removing the missing values affect your model accuracy.
That's why it is important to analyze the data more and find out a way to deal the missing values.
For this problem I dealed with the markdown's missing values with the mean of it's values.

5) model the datasets
After doing the EDA your 80% task is done. Next task is to build the model for your problem. I choosed linear regression and ExtraTreesRegressor to build the model.

6) save the output in kaggle format
Each competition in kaggle requires it's own submission format that we have to follow.For this challenge they ask to save the output in a csv with two columns- Id and weekly_Sales.
The first column Id is in format as store id_dept id_date and second column is our target variable-sales.

For everyone who wish to see my actual code can see it from here- my code.
I request you all to please download my notebook from above url, upload in your jupyter norebook, explore it, think about new approaches,apply different different machine learning algorithms to improve the model and share your inputs with me also.

Meanwhile Friends! Go chase your dreams, have an awesome day, make every second count and see you later in my next post.






Comments

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...

Detecting Credit Card Fraud As a Data Scientist

Another post starts with you beautiful people! Hope you have learnt something from my previous post about  machine learning classification real world problem Today we will continue our machine learning hands on journey and we will work on an interesting Credit Card Fraud Detection problem. The goal of this exercise is to anonymize credit card transactions labeled as fraudulent or genuine. For your own practice you can download the dataset from here-  Download the dataset! About the dataset:  The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. Let's start our analysis with loading the dataset first:- As per the  official documentation -  features V1, V2, ... V28 are the principal compo...