Skip to main content

Python Advanced- Series

Today we will learn about one of the most important data structure in pandas library- Series.
It is similar to a NumPy 1-dimensional array.
In addition to the values that are specified by the programmer, pandas attaches a label to each of the values. If the labels are not provided by the programmer, then pandas assigns labels ( 0 for first element, 1 for second element and so on).
A benefit of assigning labels to data values is that it becomes easier to perform manipulations on the dataset as the whole dataset becomes more of a dictionary where each value is associated with a label.
For more details about Series please visit Series in pandas
Let's understand Series and some operations by below code snippets-

Series example:-


Knowing values and indexing of Series:-

Defining custom indexing in your Series:-

Accessing your Series is as same as we saw in NumPy:-

Let's do some mathematical operations in our series-

If you have a dictionary, you can create a Series data structure from that dictionary. Suppose you are interested in EPS values for firms and the values come from different sources and is not clean. In that case you don't have to worry about cleaning and aligning those values-


If any index don't have the value matching the key then it will show as NaN(not a number):-

Make use of isnull() function to find out if there are any missing values in the data structure-

Key feature of Series Data  is that you don't have to worry about data alignment.
Understand this key feature with below example- if we have run a word count program on two different files and we have the following data structures-


Now if we want to calculate the sum of common words in combined files, then we don't have to worry about data alignment. If we want to include all words, then we can take care of NaN values and compute the sum. By default, Series data structure ignores NaN values-

So keep practicing by your own with above examples in your notebook and comment if you face any issue.

Comments

Post a Comment

Popular posts from this blog

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

Exploratory Data Analysis using Python

Another post starts with you beautiful people! In my previous posts and pages we have learnt basics and advanced topics of Python required in Data Science. Now it's time to do EDA, sounds interesting! Exploratory Data Analysis (EDA) is a crucial step of the data analytics process. It involves exploring the data and identifying important features about the data as well as asking interesting questions from the data by using statistical and visualization tools studied in earlier classes such as descriptive statistics and basic plotting. In this post we will use the dataset about TB data on countries and their territories. Specifically, we would using data files for TB Deaths, spread of TB, and number of new cases of TB to answer some important questions. Since we are going to perform some Exploratory Data Analysis in our TB dataset, these are the questions we want to answer: Which are the countries with the highest and infectious TB incidence? What is the general world ...

Machine Learning-Decision Trees and Random Forests

Another post starts with you beautiful people! I hope after reading my previous post about  Linear and Logistic Regression   your confidence level is up and you are now ready to move one step ahead in Machine Learning arena. In this post we will be going over Decision Trees and Random Forests . In order for you to understand this exercise completely there is some required reading. I suggests you to please read following blog post before going further- A Must Read! After reading the blog post you should have a basic layman's (or laywoman!) understanding of how decision trees and random forests work.  A quick intro is as below- Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features . For instance, in the example below, decision trees learn from data to approximate a sin...