Skip to main content

Detecting Credit Card Fraud As a Data Scientist

Another post starts with you beautiful people!
Hope you have learnt something from my previous post about machine learning classification real world problem
Today we will continue our machine learning hands on journey and we will work on an interesting Credit Card Fraud Detection problem.
The goal of this exercise is to anonymize credit card transactions labeled as fraudulent or genuine.
For your own practice you can download the dataset from here- Download the dataset!

About the dataset: The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

Let's start our analysis with loading the dataset first:-

As per the official documentationfeatures V1, V2, ... V28 are the principal components obtained with PCA. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset, feature 'Amount' is the transaction Amount and feature 'Class' is the response variable which takes value 1 in case of fraud and 0 otherwise.

Let's check our target variable- Class; how much it is balanced?


From the plot it's pretty clear that our target variable is highly unbalanced!

Now to handle such type of highly unbalanced classification problem, we should first test the data with resampling and sampling so that we can compare the result.
But before going for the resampling approach we need to normalize the 'Amount' feature. For normalization we will use a package from sklearn library- StandardScaler-

This operation will normalize the amount as below-

Now we will use traditional UNDER-sampling and we will create a 50/50 ratio for this. This will be done by randomly selecting "x" amount of sample from the majority class, being "x" the total number of records with the minority class-

Let's check number of data points in the minority class-

Next, pick the indices of the normal classes-

Now, randomly select "x" number out of the indices we picked above-

Let's append the 2 indices and prepare the under sample data-


Next, we will split the whole data and under sample data into train and test sets-


Let's check the no. of transactions of both datasets-

Now, our next step is to capture the most fraudulent transactions and since we are dealing with highly unbalanced data instead of using Accuracy and Precision metric we will use Recall here-

Now find out best model using k-fold score-



Let's predict the model for under sample data-
And compute the confusion matrix-
And the result is-

Now apply the model we fitted and test it on the whole data; it will give us-

It's a very decent recall accuracy when applying it to a much larger and skewed dataset!

Now to check if the model is also predicting as a whole correctly and not making many errors, we will use AUC and ROC curve-

Once we plot the above ROC curve, we get the AUC as 0.95.

The above approach we should try for the skewed data also and then build the final model with the whole training dataset and predict the classes in the test. I leave it to you for your own. Remember this step will ensure you that by undersampling the data, our algorithm does a much better job at detecting fraud.

So friends what should you can try more after the above approach? I give you a way- change the classification threshfold, investigate Precision-Recall curve, test the SVM and decision trees and share your experience with me.

Meanwhile Friends! Go chase your dreams, have an awesome day, make every second count and see you later in my next post.






Comments

  1. It is nice blog Thank you provide important information and i am searching for same information to save my time Data Science online Training


    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Interseting article for all the onlookers keen to learn python, ensure you get protected from credit card fraud
    and remain vigilant while using the same in publicly.

    ReplyDelete
  4. Currently Python is the most popular Language in IT. Python adopted as a language of choice for almost all the domain in IT including Web Development, Cloud Computing (AWS, OpenStack, VMware, Google Cloud, etc.. ),Read More

    myTectra the Market Leader in Artificial intelligence training in Bangalore
    myTectra offers Artificial intelligence training in Bangalore using Class Room. myTectra offers Live Online Design Patterns Training Globally.Read More


    myTectra the Market Leader in Machine Learning Training in Bangalore
    myTectra offers Machine Learning Training in Bangalore using Class Room. myTectra offers Live Online Machine Learning Training Globally. Read More

    ReplyDelete
  5. Credit card fraud is now a great crime for all over world. You can take help from credit card fraud lawyer to solve your problem.
    Credit Card Fraud Kansas City lawyer

    ReplyDelete
  6. The blog was absolutely fantastic! Lot of information is helpful in some or the other way. Keep updating the blog, looking forward for more content...Great job, keep it up
    Oracle Fusion Financials Online Training
    Oracle Fusion HCM Online Training
    Oracle Fusion SCM Online Training

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.datascience with python training in bangalore











    ReplyDelete
  9. This Data science Course in Gurgaon equips with all the latest technologies in Big Data, analytics, and R programming. Thus you can easily take your career to the next level after completion of Data science Course in Gurgaon.
    For More Info: Data Science Training in Gurgaon

    ReplyDelete
  10. https://csatuwaterloo.blogspot.com/2019/05/positiond-available-at-heali-ai.html?showComment=1584015558180#c5744183400185090282

    ReplyDelete
  11. You ought to be a part of a contest for one of the finest technology sites on the internet. I will recommend this web site!

    ReplyDelete
  12. Thank you for sharing this wonderful information. The blog is really helpful...keep sharing.
    Best python certification course in Bangalore

    ReplyDelete
  13. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!


    This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharng this information,this is useful to me.Java training in Chennai

    Java Online training in Chennai

    Java Course in Chennai

    Best JAVA Training Institutes in Chennai

    Java training in Bangalore

    Java training in Hyderabad

    Java Training in Coimbatore

    Java Training

    Java Online Training

    ReplyDelete
  14. This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.
    selenium training in chennai

    selenium training in chennai

    selenium online training in chennai

    software testing training in chennai

    selenium training in bangalore

    selenium training in hyderabad

    selenium training in coimbatore

    selenium online training

    selenium training


    ReplyDelete
  15. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.Your post is just outstanding! thanks for such a post,its really going great and great work.You have provided great knowledge


    Azure Training in Chennai

    Azure Training in Bangalore

    Azure Training in Hyderabad

    Azure Training in Pune

    Azure Training | microsoft azure certification | Azure Online Training Course

    Azure Online Training

    ReplyDelete
  16. Excellent Blog! I would Thanks for sharing this wonderful content.its very useful to us.There is lots of Post about Python But your way of Writing is so Good & Knowledgeable. I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
    DevOps Training in Chennai

    DevOps Online Training in Chennai

    DevOps Training in Bangalore

    DevOps Training in Hyderabad

    DevOps Training in Coimbatore

    DevOps Training

    DevOps Online Training

    ReplyDelete
  17. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.Your post is just outstanding!!!
    Data Science Training In Chennai

    Data Science Online Training In Chennai

    Data Science Training In Bangalore

    Data Science Training In Hyderabad

    Data Science Training In Coimbatore

    Data Science Training

    Data Science Online Training

    ReplyDelete
  18. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog.I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.

    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training

    ReplyDelete
  19. Thank you a bunch for this with all of us you actually realize what you are talking about! Bookmarked. Please also seek advice from my site =). We could have a hyperlink change contract between us!
    single customer view

    ReplyDelete
  20. it’s very helpful useful thanks for your valuable information follow us
    Data Science Online Training in Hyderabad

    ReplyDelete

Post a Comment

Popular posts from this blog

How to install and compile YOLO v4 with GPU enable settings in Windows 10?

Another post starts with you beautiful people! Last year I had shared a post about  installing and compiling Darknet YOLOv3   in your Windows machine and also how to detect an object using  YOLOv3 with Keras . This year on April' 2020 the fourth generation of YOLO has arrived and since then I was curious to use this as soon as possible. Due to my project (built on YOLOv3 :)) work I could not find a chance to check this latest release. Today I got some relief and successfully able to install and compile YOLOv4 in my machine. In this post I am going to share a single shot way to do the same in your Windows 10 machine. If your machine does not have GPU then you can follow my  previous post  by just replacing YOLOv3 related files with YOLOv4 files. For GPU having Windows machine, follow my steps to avoid any issue while building the Darknet repository. My machine has following configurations: Windows 10 64 bit Intel Core i7 16 GB RAM NVIDIA GeForce GTX 1660 Ti Version 445.87

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-python --up