Learn Data Science using Python

Posts

Showing posts from December, 2017

Can you build a model to predict toxic comments?

Another post starts with you beautiful people! Hope you have learnt something new and very powerful machine learning model from my previous post- How to use LightGBM? Till now you must have an idea that there is no any area left that a machine learning model cannot be applied; yes it's everywhere! Continuing our journey today we will learn how to deal a problem which consists texts/sentences as feature. Examples of such kind of problems you see in internet sites, emails, posts , social media etc. Data Scientists sitting in industry giants like Quora, Twitter, Facebook, Google are working very smartly to build machine learning models to classify texts/sentences/words. Today we are going to do the same and believe me friends once you do some hand on, you will be also in the same hat. Challenge Link : jigsaw-toxic-comment-classification-challenge Problem : We’re challenged to build a multi-headed model that’s capable of detecting different types of toxicity like threats,

LightGBM and Kaggle's Mercari Price Suggestion Challenge

Another post starts with you beautiful people! I hope you have enjoyed and must learnt something from previous two posts about real world machine learning problems in Kaggle. As I said earlier Kaggle is a great platform to apply your machine learning skills and enhance your knowledge; today I will share again my learning from there with all of you! In this post we will work upon an online machine learning competition where we need to predict the the price of products for Japan’s biggest community-powered shopping app. The main attraction of this challenge is that this is a Kernels-only competition; it means the datasets are given for downloading only in stage 1.In next final stage it will be available only in Kernels. What kind of problem is this? Since our goal is to predict the price (which is a number), it will be a regression problem. Data: You can see the datasets here Exploring the datasets: The datasets provided are in the zip format of 'tsv'. So how can

Can you predict sales for a retail store?

Another post starts with you beautiful people! Hope you have enjoyed my last post about kaggle submission and you also tried to build your own machine learning model. To continue the same spirit today I will discuss about my model submission for the Wallmart Sales Forecasting where I got a score of 3077 (rank will be 196) in kaggle. Challenge : In this challenge, we are provided with historical sales data for 45 Walmart stores located in different regions since 2010-02-05 to 2012-11-01. Each store contains a number of departments, and we are tasked with predicting the department-wide sales for each store. My Sollution : To solve this machine learning regression problem I followed below steps- 1) load the datasets . While loading the datasets, I ensured that required attributes only should be there in train and feature datasets. The important key feature which affects the sales of a store mostly is 'markdown' because this feature contains the information