Skip to main content

Can you predict sales for a retail store?

Another post starts with you beautiful people!
Hope you have enjoyed my last post about kaggle submission and you also tried to build your own machine learning model.
To continue the same spirit today I will discuss about my model submission for the Wallmart Sales Forecasting where I got a score of 3077 (rank will be 196) in kaggle.

Challenge: In this challenge, we are provided with historical sales data for 45 Walmart stores located in different regions since 2010-02-05 to 2012-11-01. Each store contains a number of departments, and we are tasked with predicting the department-wide sales for each store.

My Sollution: To solve this machine learning regression problem I followed below steps-

1) load the datasets.

 While loading the datasets, I ensured that required attributes only should be there in train and feature datasets.

The important key feature which affects the sales of a store mostly is 'markdown' because this feature contains the information of festival seasons.
So it really important to do feature engineering of this attribute.

2) divide the markdown into groups

3) combine the test,train,feature datasets with markdown grouping
It's always a good approach to combine the datasets so that you get more features as a combined dataset and then you can split them accordingly your need.
Please note applying a model into a dataset is only a 20% of your task but the 60% of the task is taken by the initial steps of your machine learning model; that is EDA of the problem.
For every column of your dataset first visualize how each column is co-related to the target variable that is in our case is sales.

In this dataset there is a Date column and it means you are dealing with time series data and it need your attention to handle this data.
For handling the date with holiday we need to write some logic in such a way that our logic handle the markdown with respect to every store and it's sales.

4) handle the missing values
Always check how many missing values are present in each attribute of your dataset. For the time saving people just simply remove those. But removing the missing values affect your model accuracy.
That's why it is important to analyze the data more and find out a way to deal the missing values.
For this problem I dealed with the markdown's missing values with the mean of it's values.

5) model the datasets
After doing the EDA your 80% task is done. Next task is to build the model for your problem. I choosed linear regression and ExtraTreesRegressor to build the model.

6) save the output in kaggle format
Each competition in kaggle requires it's own submission format that we have to follow.For this challenge they ask to save the output in a csv with two columns- Id and weekly_Sales.
The first column Id is in format as store id_dept id_date and second column is our target variable-sales.

For everyone who wish to see my actual code can see it from here- my code.
I request you all to please download my notebook from above url, upload in your jupyter norebook, explore it, think about new approaches,apply different different machine learning algorithms to improve the model and share your inputs with me also.

Meanwhile Friends! Go chase your dreams, have an awesome day, make every second count and see you later in my next post.






Comments

Popular posts from this blog

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

Can you build a model to predict toxic comments?

Another post starts with you beautiful people! Hope you have learnt something new and very powerful machine learning model from my previous post-  How to use LightGBM? Till now you must have an idea that there is no any area left that a machine learning model cannot be applied; yes it's everywhere! Continuing our journey today we will learn how to deal a problem which consists texts/sentences as feature. Examples of such kind of problems you see in internet sites, emails, posts , social media etc. Data Scientists sitting in industry giants like Quora, Twitter, Facebook, Google are working very smartly to build machine learning models to classify texts/sentences/words. Today we are going to do the same and believe me friends once you do some hand on, you will be also in the same hat. Challenge Link :  jigsaw-toxic-comment-classification-challenge Problem : We’re challenged to build a multi-headed model that’s capable of detecting different types of toxicity like thre...

How to install and compile YOLO v4 with GPU enable settings in Windows 10?

Another post starts with you beautiful people! Last year I had shared a post about  installing and compiling Darknet YOLOv3   in your Windows machine and also how to detect an object using  YOLOv3 with Keras . This year on April' 2020 the fourth generation of YOLO has arrived and since then I was curious to use this as soon as possible. Due to my project (built on YOLOv3 :)) work I could not find a chance to check this latest release. Today I got some relief and successfully able to install and compile YOLOv4 in my machine. In this post I am going to share a single shot way to do the same in your Windows 10 machine. If your machine does not have GPU then you can follow my  previous post  by just replacing YOLOv3 related files with YOLOv4 files. For GPU having Windows machine, follow my steps to avoid any issue while building the Darknet repository. My machine has following configurations: Windows 10 64 bit Intel Core i7 16 GB RAM NVIDIA GeForce G...