💳 Credit Card Fraud Detection in R

📌 Goal:
Develop a robust fraud detection system using machine learning techniques on highly imbalanced financial transaction data to accurately identify fraudulent activity.

🧠 Domain: Financial Analytics & Machine Learning
🎯 Task: Classification (Fraud vs. Genuine)
📂 Dataset: Kaggle Credit Card Fraud Dataset (284,807 transactions, 492 fraud cases, 0.172%)

Project Domain

Machine Learning

Task

Classification and Prediction

thumbnail Image
thumbnail Image
thumbnail Image

The Goal:

Catch the Fraud..!

Catch the Fraud..!

Financial fraud detection remains a critical challenge for banks and payment systems due to its rarity and evolving attack patterns. This project focuses on building and comparing multiple machine learning models in R to detect fraudulent transactions from anonymized PCA-transformed features. The primary goal is to improve recall and F1-score for minority (fraudulent) cases while minimizing false negatives, which carry significant financial risk.

1

Image
Image

The Challenge:

Methodology & Process

  • Data Preprocessing:

    • Removed non-informative features (e.g., Time).

    • Standardized Amount for consistent scale.

    • Addressed severe class imbalance using multiple strategies:

      • Down-sampling of majority class

      • Up-sampling of minority class

      • ROSE (Random Over-Sampling Examples) for synthetic balance

  • Exploratory Analysis:

    • Distribution and imbalance visualization.

    • t-SNE and PCA clustering to assess feature separability.

  • Model Development:

    • Implemented Decision Tree (CART), Random Forest, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) classifiers.

    • Evaluated across original and ROSE-balanced datasets.

  • Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC.

Results & Findings

Model

Dataset

Accuracy

Precision

Recall

F1-Score

AUC

Decision Tree (CART)

Imbalanced

99.8 %

High

Low

Low

0.912

Decision Tree (ROSE)

Balanced

0.968

KNN (k = 3)

ROSE

99.9 %

0.92

0.54

0.68

SVM

ROSE

Strong

Moderate

Balanced

2

Image
Image

The Result

✅ Key Insight:

  • Resampling with ROSE substantially improved detection of minority (fraudulent) cases.

  • SVM achieved the best trade-off between precision and recall, while KNN exhibited strong accuracy but weaker recall—highlighting the difficulty of identifying rare fraud cases.

  • Models with overly high precision but low recall can miss critical frauds, underscoring the need for cost-sensitive approaches.

🔮 Future Scope:

  • Integrate ensemble methods (XGBoost, LightGBM, CatBoost) to enhance generalization.

  • Apply cost-sensitive learning to penalize missed fraud detections.

  • Explore real-time deployment pipelines for live transaction monitoring.

3

Image
Image

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner