Skip to main content

Detecting Credit Card Fraud As a Data Scientist

Another post starts with you beautiful people!
Hope you have learnt something from my previous post about machine learning classification real world problem
Today we will continue our machine learning hands on journey and we will work on an interesting Credit Card Fraud Detection problem.
The goal of this exercise is to anonymize credit card transactions labeled as fraudulent or genuine.
For your own practice you can download the dataset from here- Download the dataset!

About the dataset: The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

Let's start our analysis with loading the dataset first:-

As per the official documentationfeatures V1, V2, ... V28 are the principal components obtained with PCA. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset, feature 'Amount' is the transaction Amount and feature 'Class' is the response variable which takes value 1 in case of fraud and 0 otherwise.

Let's check our target variable- Class; how much it is balanced?


From the plot it's pretty clear that our target variable is highly unbalanced!

Now to handle such type of highly unbalanced classification problem, we should first test the data with resampling and sampling so that we can compare the result.
But before going for the resampling approach we need to normalize the 'Amount' feature. For normalization we will use a package from sklearn library- StandardScaler-

This operation will normalize the amount as below-

Now we will use traditional UNDER-sampling and we will create a 50/50 ratio for this. This will be done by randomly selecting "x" amount of sample from the majority class, being "x" the total number of records with the minority class-

Let's check number of data points in the minority class-

Next, pick the indices of the normal classes-

Now, randomly select "x" number out of the indices we picked above-

Let's append the 2 indices and prepare the under sample data-


Next, we will split the whole data and under sample data into train and test sets-


Let's check the no. of transactions of both datasets-

Now, our next step is to capture the most fraudulent transactions and since we are dealing with highly unbalanced data instead of using Accuracy and Precision metric we will use Recall here-

Now find out best model using k-fold score-



Let's predict the model for under sample data-
And compute the confusion matrix-
And the result is-

Now apply the model we fitted and test it on the whole data; it will give us-

It's a very decent recall accuracy when applying it to a much larger and skewed dataset!

Now to check if the model is also predicting as a whole correctly and not making many errors, we will use AUC and ROC curve-

Once we plot the above ROC curve, we get the AUC as 0.95.

The above approach we should try for the skewed data also and then build the final model with the whole training dataset and predict the classes in the test. I leave it to you for your own. Remember this step will ensure you that by undersampling the data, our algorithm does a much better job at detecting fraud.

So friends what should you can try more after the above approach? I give you a way- change the classification threshfold, investigate Precision-Recall curve, test the SVM and decision trees and share your experience with me.

Meanwhile Friends! Go chase your dreams, have an awesome day, make every second count and see you later in my next post.






Comments

  1. It is nice blog Thank you provide important information and i am searching for same information to save my time Data Science online Training


    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Interseting article for all the onlookers keen to learn python, ensure you get protected from credit card fraud
    and remain vigilant while using the same in publicly.

    ReplyDelete
  4. Currently Python is the most popular Language in IT. Python adopted as a language of choice for almost all the domain in IT including Web Development, Cloud Computing (AWS, OpenStack, VMware, Google Cloud, etc.. ),Read More

    myTectra the Market Leader in Artificial intelligence training in Bangalore
    myTectra offers Artificial intelligence training in Bangalore using Class Room. myTectra offers Live Online Design Patterns Training Globally.Read More


    myTectra the Market Leader in Machine Learning Training in Bangalore
    myTectra offers Machine Learning Training in Bangalore using Class Room. myTectra offers Live Online Machine Learning Training Globally. Read More

    ReplyDelete
  5. Credit card fraud is now a great crime for all over world. You can take help from credit card fraud lawyer to solve your problem.
    Credit Card Fraud Kansas City lawyer

    ReplyDelete
  6. The blog was absolutely fantastic! Lot of information is helpful in some or the other way. Keep updating the blog, looking forward for more content...Great job, keep it up
    Oracle Fusion Financials Online Training
    Oracle Fusion HCM Online Training
    Oracle Fusion SCM Online Training

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.datascience with python training in bangalore











    ReplyDelete
  9. This Data science Course in Gurgaon equips with all the latest technologies in Big Data, analytics, and R programming. Thus you can easily take your career to the next level after completion of Data science Course in Gurgaon.
    For More Info: Data Science Training in Gurgaon

    ReplyDelete
  10. https://csatuwaterloo.blogspot.com/2019/05/positiond-available-at-heali-ai.html?showComment=1584015558180#c5744183400185090282

    ReplyDelete
  11. You ought to be a part of a contest for one of the finest technology sites on the internet. I will recommend this web site!

    ReplyDelete
  12. Thank you for sharing this wonderful information. The blog is really helpful...keep sharing.
    Best python certification course in Bangalore

    ReplyDelete
  13. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!


    This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharng this information,this is useful to me.Java training in Chennai

    Java Online training in Chennai

    Java Course in Chennai

    Best JAVA Training Institutes in Chennai

    Java training in Bangalore

    Java training in Hyderabad

    Java Training in Coimbatore

    Java Training

    Java Online Training

    ReplyDelete
  14. This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.
    selenium training in chennai

    selenium training in chennai

    selenium online training in chennai

    software testing training in chennai

    selenium training in bangalore

    selenium training in hyderabad

    selenium training in coimbatore

    selenium online training

    selenium training


    ReplyDelete
  15. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.Your post is just outstanding! thanks for such a post,its really going great and great work.You have provided great knowledge


    Azure Training in Chennai

    Azure Training in Bangalore

    Azure Training in Hyderabad

    Azure Training in Pune

    Azure Training | microsoft azure certification | Azure Online Training Course

    Azure Online Training

    ReplyDelete
  16. Excellent Blog! I would Thanks for sharing this wonderful content.its very useful to us.There is lots of Post about Python But your way of Writing is so Good & Knowledgeable. I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
    DevOps Training in Chennai

    DevOps Online Training in Chennai

    DevOps Training in Bangalore

    DevOps Training in Hyderabad

    DevOps Training in Coimbatore

    DevOps Training

    DevOps Online Training

    ReplyDelete
  17. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.Your post is just outstanding!!!
    Data Science Training In Chennai

    Data Science Online Training In Chennai

    Data Science Training In Bangalore

    Data Science Training In Hyderabad

    Data Science Training In Coimbatore

    Data Science Training

    Data Science Online Training

    ReplyDelete
  18. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog.I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.

    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training

    ReplyDelete
  19. Thank you a bunch for this with all of us you actually realize what you are talking about! Bookmarked. Please also seek advice from my site =). We could have a hyperlink change contract between us!
    single customer view

    ReplyDelete
  20. it’s very helpful useful thanks for your valuable information follow us
    Data Science Online Training in Hyderabad

    ReplyDelete
  21. I am searching for Data science course on google, then I found this blog. Thank you for sharing this informative Blog related "Detecting Credit Card Fraud As a Data Scientist". I have just started to learn Data Science Course in Bhubaneswar, I have Joined the in Gripdata Analytics is Known as Data science institute in Bhubaneswar. This article is covered in detailed also it is useful for me as a Beginner in Data Science Course.

    ReplyDelete
  22. Our decision-making process has been completely transformed by Corestrat Best decision intelligence software, which offers real-time analytics and practical insights. A real game-changer for companies!

    ReplyDelete

Post a Comment

Popular posts from this blog

How to deploy your ML model as Fast API?

Another post starts with you beautiful people! Thank you all for showing so much interests in my last posts about object detection and recognition using YOLOv4. I was very happy to see many aspiring data scientists have learnt from my past three posts about using YOLOv4. Today I am going to share you all a new skill to learn. Most of you have seen my post about  deploying and consuming ML models as Flask API   where we have learnt to deploy and consume a keras model with Flask API  . In this post you are going to learn a new framework-  FastAPI to deploy your model as Rest API. After completing this post you will have a new industry standard skill. What is FastAPI? FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. It is easy to learn, fast to code and ready for production . Yes, you heard it right! Flask is not meant to be used in production but with FastAPI you can use you...

Learn the fastest way to build data apps

Another post starts with you beautiful people! I hope you have enjoyed and learned something new from my previous three posts about machine learning model deployment. In one post we have learned  How to deploy a model as FastAPI?  I n the second post, we have learned  How to deploy a deep learning model as RestAPI ? and in the third post, we have also learned  How to scale your deep learning model API?   If you are following my blog posts, you have seen how easily you have transit yourselves from aspiring to a mature data scientist. In this new post, I am going to share a new framework-  Streamlit which will help you to easily create a beautiful app with Python only. I will show here how had I used the Streamlit framework to create an app for my YOLOv3 custom model. What is Streamlit? Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours!...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...