Skip to main content

Principal Component Analysis or PCA in machine learning


Another post starts with you beautiful people!
I hope you have enjoyed and must learn something from my previous post about Cross Validation & ROC.
In this post we are going to learn Principal Component Analysis or POC.
Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock market predictions, the analysis of gene expression data, and many more.
The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent.

Importantly, the dataset on which PCA technique is to be used must be scaled. The results are also sensitive to the relative scaling. As a layman, it is a method of summarizing data.
Imagine some wine bottles on a dining table. Each wine is described by its attributes like colour, strength, age, etc. But redundancy will arise because many of them will measure related properties. So what PCA will do in this case is summarize each wine in the stock with less characteristics.

In other words, Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

  • With PCA we can reduce the dimentions with out losing much information
  • PCA also helps to remove the multicollinearity between the variables

Dataset information:-
We will use SPECTF dataset from UCI machine learning repository[download here]
The dataset describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal and abnormal. 
The database of 267 SPECT image sets (patients) was processed to extract features that summarize the original SPECT images. As a result, 44 continuous feature pattern was created for each patient.





Let's train a LogisticRegression model and record the time taken to train before applying PCA:-

Standardising the variables:-


Result:-


This cumilative explained variance graph helps us to choose the number of desired principal components.
90% variation in the data is explaining by the first 15 principal components.

Result:-

PCA transforms a set of correlated variables into a set of linearly uncorrelated variables called principal components, we can check the correlarion with a heat map of correlation matrix.



Check the performance after considering the first 15 principal components:-

We can conclude that the computational time is reduced by several times after applying PCA and selecting 15 principal components, And the variables are transformed to a new set of linearly uncorrelated variables.

Comments

  1. I like your blog, I read this blog please update more content on hacking,Nice post
    Data Science online Training

    ReplyDelete
  2. propidduozo Shannon Romero download
    wirkruptbunsper

    ReplyDelete

Post a Comment

Popular posts from this blog

How to use TensorBoard with TensorFlow 2.0 in Google Colaboratory?

Another post starts with you beautiful people! It is quite a wonderful moment for me that many Aspiring Data Scientists like you have connected with me through my facebook page and have started their focused journey to be a Data Scientists by following my  book . If you have not then I recommend to atleast visit my  last post here . In two of my previous posts we have learnt about keras and colab. In this post I am going to share with you all that TensorFlow 2.0 has been released and one quite interesting news about this release is that our beloved deep learning library keras is in built with it. Yes! You heard it right. If you know keras then using TensorFlow 2.0 library is quite easy for you. One of the interesting benefit of using TensorFlow library is it's visualization tool known as  TensorBoard . In this post we are going to learn how to use TensorFlow 2.0 with MNIST dataset and then setup TensorBoard with Google Colaboratory. Let's start this pos...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...

How to install and compile YOLO v4 with GPU enable settings in Windows 10?

Another post starts with you beautiful people! Last year I had shared a post about  installing and compiling Darknet YOLOv3   in your Windows machine and also how to detect an object using  YOLOv3 with Keras . This year on April' 2020 the fourth generation of YOLO has arrived and since then I was curious to use this as soon as possible. Due to my project (built on YOLOv3 :)) work I could not find a chance to check this latest release. Today I got some relief and successfully able to install and compile YOLOv4 in my machine. In this post I am going to share a single shot way to do the same in your Windows 10 machine. If your machine does not have GPU then you can follow my  previous post  by just replacing YOLOv3 related files with YOLOv4 files. For GPU having Windows machine, follow my steps to avoid any issue while building the Darknet repository. My machine has following configurations: Windows 10 64 bit Intel Core i7 16 GB RAM NVIDIA GeForce G...