🫁 Lung Cancer Detection Using Deep Learning

📌 Goal:
Develop an automated model for early lung cancer detection using CT scans and patient attributes to assist clinicians in rapid, accurate diagnosis.

🧠 Domain: Medical Imaging & Predictive Analytics
🎯 Task: Classification (Cancer vs. Non-Cancer)
📂 Dataset: Kaggle – Lung Cancer Prediction Dataset (284 samples, 16 attributes)

Project Domain

Machine Learning

Task

Classification and Prediction

thumbnail Image
thumbnail Image
thumbnail Image

The Goal:

Why? and What?

Why? and What?

Early detection of lung cancer greatly improves survival rates, but manual diagnosis from CT scans and patient history is time-intensive. This project leverages deep learning and traditional ML methods to predict lung cancer risk using demographic, behavioral, and symptom-based features. The objective was to design a system that automatically classifies patients into cancerous or non-cancerous categories, supporting radiologists with data-driven decision-making.

1

Image
Image

The Challenge:

Methodology & Process

  • Data Source: 284 patient records (16 clinical attributes: age, smoking, coughing, chest pain, shortness of breath, etc.).

  • Preprocessing: Duplicate removal, categorical encoding, outlier filtering (IQR), and class rebalancing via SMOTE to handle imbalance (Yes: 238 / No: 38).

  • Modeling Approaches:

    • Classical ML: KNN, SVC, Decision Tree, Random Forest, XGBoost, LightGBM, Gradient Boosting.

    • Advanced ensemble: CatBoost Classifier tuned with GridSearchCV and 5-fold cross-validation.

  • Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, AUC.

  • Tools: Python (Sklearn, CatBoost, LightGBM, Matplotlib, Seaborn).

Model

Accuracy

Precision

Recall

F1-Score

AUC

KNN

0.93

0.93

0.92

0.93

0.93

SVC

0.95

0.95

0.95

0.95

0.95

Decision Tree / Random Forest

0.96

0.96

0.96

0.96

0.96

XGBoost / LightGBM

0.94 – 0.95

0.94

0.94

0.94

0.95

CatBoost (Best)

0.97

0.97

0.97

0.97

0.97

2

Image
Image

The Result

✅ Key Insight:
The CatBoost Classifier achieved top performance (96.9 % accuracy, F1 ≈ 0.97) with strong generalization verified via 5-fold cross-validation (mean ≈ 94.6 %).
Its robust handling of categorical features and reduced overfitting make it highly suitable for clinical deployment scenarios.

📊 Visuals:
Age vs. Cancer Density Plots • Correlation Heatmap • ROC Curves for All Models • Confusion Matrix Analysis.

🏁 Conclusion:
This study demonstrates how ensemble-based deep learning methods—especially CatBoost—can effectively model patient-level cancer risk and enable interpretable, automated detection pipelines for medical diagnostics.

3

Image
Image

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner