🩺 Multiclass Obesity Prediction using LightGBM

📌 Goal:
Predict individual obesity levels (7 classes) from lifestyle, demographic, and dietary features using advanced machine learning classifiers, with LightGBM achieving the best overall accuracy.

🧠 Domain: Health Informatics & Predictive Analytics
🎯 Task: Multiclass Classification
📂 Dataset: UCI/Kaggle Obesity Dataset (~21K records, 17 features)

Project Domain

Machine Learning

Task

Classification and Prediction

thumbnail Image
thumbnail Image
thumbnail Image

The Goal:

Gain or Cut?

Gain or Cut?

This study explores obesity prediction as a complex health analytics problem integrating behavioral, demographic, and physiological factors. Traditional measures like BMI often fail to represent multidimensional risk. The goal was to design a robust ML pipeline that classifies individuals into seven obesity levels — from Insufficient Weight to Obesity Class III — supporting precision healthcare and personalized intervention design.

1

Image
Image

The Challenge:

Methodology & Process

  • Data Preparation:
    Cleaned 21,758 samples from UCI/Kaggle datasets; handled missing data, outliers (IQR method), and applied SMOTE for class balance.
    Derived BMI and normalized numerical features with Min-Max scaling.

  • Feature Engineering:
    Encoded categorical variables (gender, activity, diet, transport) using one-hot and label encoders.

  • Models Implemented:
    Logistic Regression, KNN, SVM, Naive Bayes, Decision Tree, Random Forest, XGBoost, CatBoost, and LightGBM.

  • Optimization:
    Hyperparameter tuning via Optuna; evaluated using Accuracy, Precision, Recall, F1-score, and AUC metrics.

  • Implementation Stack: Python | Scikit-learn | Optuna | LightGBM | Matplotlib

2

Image
Image

The Result

Results & Findings

Model

Accuracy

Key Insight

Logistic Regression

86.7 %

Strong baseline, interpretable

SVM

88.6 %

Captured non-linear relations

Random Forest

89.2 %

Robust to noise, feature ranking

LightGBM

90 % +

Best balance of speed, scalability, and accuracy


✅ Key Outcomes:

  • LightGBM achieved the highest overall accuracy and cross-validation score, excelling in multiclass generalization.

  • The model revealed BMI, age, and activity frequency as dominant predictors.

  • Demonstrated feasibility of scalable ML systems for personalized obesity risk assessment.

🔮 Future Work:
Enhance interpretability via SHAP/LIME, extend to real-time wellness dashboards, and explore hybrid stacking with neural networks for improved multi-class sensitivity.

3

Image
Image

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner