Skip to main content

Python Advanced- Visualizing the Titanic Disaster

Another post starts with you beautiful people !
Today we will work on a famous dataset Titanic Dataset taken from kaggle.
This dataset gives information about the details of the passengers aboard the Titanic and a column on survival of the passengers. Those who survived are represented as “1” while those who did not survive are represented as “0”.

The columns in the dataset are as below-
PassengerId: Passenger Identity
Survived: Whether passenger survived or not
Pclass: Class of ticket
Name: Name of passenger
Sex: Sex of passenger (Male or Female)
Age: Age of passenger
SibSp: Number of sibling and/or spouse travelling with passenger
Parch: Number of parent and/or children travelling with passenger
Ticket: Ticket number
Fare: Price of ticket
Cabin: Cabin number

Let's starts some hands on-


Let's generates descriptive statistics-






Result:





Note: if you are seeing error- ImportError: No module named 'seaborn' then it mean you need to install the seaborn library using command- pip install seaborn in the command prompt.


Result:

Let's find out the children in the dataset-


Let's count the person individually-


Now plot Male, Female, Child in Pclass-

Result:





People Who Survived and Who Didn't:




How many Male and Female survived :
                                          
Result-More females survive than males.

Let's compute pairwise correlation of columns, excluding NA/null values:-




Result:

See with the help of above visualization how you can easily transform a dataset into a story telling.
Try in your notebook and share your thoughts in comment.

Comments

Popular posts from this blog

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

Machine Learning-Linear Regression

Another post starts with you beautiful people! In my previous posts we have learnt the Python basics and advanced, statistics techniques for the Data Science track. I suggest you to please read previous post just for 10-15 min. before sleeping daily and then there is no any obstacle to stop you to become a great Data Scientist. In this post we will start our Machine Learning  track with the  Linear Regression   topic. I Have highlighted the both so please click on the link to know the formal definition of those. Machine learning-  More specifically the field of predictive modeling is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. In applied machine learning we will borrow, reuse and steal algorithms from many different fields, including statistics and use them towards these ends. Linear Regression was developed in the field of statistics and is studied as a model ...

Can you predict sales for a retail store?

Another post starts with you beautiful people! Hope you have enjoyed my last post about  kaggle submission   and you also tried to build your own machine learning model. To continue the same spirit today I will discuss about my model submission for the  Wallmart Sales Forecasting   where I got a score of 3077 (rank will be 196) in kaggle. Challenge : In this challenge, we are provided with historical sales data for 45 Walmart stores located in different regions since 2010-02-05 to 2012-11-01. Each store contains a number of departments, and we are tasked with predicting the department-wide sales for each store. My Sollution : To solve this machine learning regression problem I followed below steps- 1) load the datasets .  While loading the datasets, I ensured that required attributes only should be there in train and feature datasets. The important key feature which affects the sales of a store mostly is 'markdown' because this feature cont...