Skip to main content

Principal Component Analysis or PCA in machine learning


Another post starts with you beautiful people!
I hope you have enjoyed and must learn something from my previous post about Cross Validation & ROC.
In this post we are going to learn Principal Component Analysis or POC.
Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock market predictions, the analysis of gene expression data, and many more.
The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent.

Importantly, the dataset on which PCA technique is to be used must be scaled. The results are also sensitive to the relative scaling. As a layman, it is a method of summarizing data.
Imagine some wine bottles on a dining table. Each wine is described by its attributes like colour, strength, age, etc. But redundancy will arise because many of them will measure related properties. So what PCA will do in this case is summarize each wine in the stock with less characteristics.

In other words, Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

  • With PCA we can reduce the dimentions with out losing much information
  • PCA also helps to remove the multicollinearity between the variables

Dataset information:-
We will use SPECTF dataset from UCI machine learning repository[download here]
The dataset describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal and abnormal. 
The database of 267 SPECT image sets (patients) was processed to extract features that summarize the original SPECT images. As a result, 44 continuous feature pattern was created for each patient.





Let's train a LogisticRegression model and record the time taken to train before applying PCA:-

Standardising the variables:-


Result:-


This cumilative explained variance graph helps us to choose the number of desired principal components.
90% variation in the data is explaining by the first 15 principal components.

Result:-

PCA transforms a set of correlated variables into a set of linearly uncorrelated variables called principal components, we can check the correlarion with a heat map of correlation matrix.



Check the performance after considering the first 15 principal components:-

We can conclude that the computational time is reduced by several times after applying PCA and selecting 15 principal components, And the variables are transformed to a new set of linearly uncorrelated variables.

Comments

  1. I like your blog, I read this blog please update more content on hacking,Nice post
    Data Science online Training

    ReplyDelete
  2. propidduozo Shannon Romero download
    wirkruptbunsper

    ReplyDelete

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...

Detecting Credit Card Fraud As a Data Scientist

Another post starts with you beautiful people! Hope you have learnt something from my previous post about  machine learning classification real world problem Today we will continue our machine learning hands on journey and we will work on an interesting Credit Card Fraud Detection problem. The goal of this exercise is to anonymize credit card transactions labeled as fraudulent or genuine. For your own practice you can download the dataset from here-  Download the dataset! About the dataset:  The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. Let's start our analysis with loading the dataset first:- As per the  official documentation -  features V1, V2, ... V28 are the principal compo...