🌍 Life Expectancy Prediction Using Regression Analysis

📌 Goal:
Predict life expectancy across 193 countries (2000–2015) using regression-based machine learning models to identify the key health, economic, and social factors influencing global longevity.

🧠 Domain: Global Health Analytics & Predictive Modeling
🎯 Task: Regression (Continuous Prediction)
📂 Dataset: WHO Global Health Observatory & UN Data (193 countries × 22 features, 2000–2015)

Project Domain

Data Analysis and Regression

Task

Prediction

thumbnail Image
thumbnail Image
thumbnail Image

The Goal:

Healthy Long Life ....? Yes..!

Healthy Long Life ....? Yes..!

Life expectancy is a core indicator of a nation’s health, shaped by the interplay of medical, economic, and social determinants. This project aimed to model life expectancy across countries and time using multivariate regression techniques, evaluating which indicators most strongly influence human longevity. The analysis sought to aid policymakers by quantifying how factors such as education, healthcare expenditure, and disease prevalence affect global health outcomes.

1

Image
Image

The Challenge:

Methodology & Process

  • Data Compilation:
    Combined WHO and UN datasets (193 countries, 16 years) totaling 2,938 entries and 22 attributes.
    Missing values (e.g., GDP, Hepatitis B, Schooling) were handled by imputation and country-level exclusion when necessary.

  • Preprocessing:

    • Outlier detection via Boxplots, Z-Scores, and IQR

    • Correlation and multicollinearity checks (VIF < 10)

    • Feature selection using Backward Elimination

  • Models Evaluated:

    • Linear Regression

    • Ridge Regression (L2 regularization)

    • Lasso Regression (L1 regularization)

    • Elastic Net Regression (combined L1 + L2)

  • Model Validation:
    Applied 5-fold cross-validation (mean RMSE ≈ 3.71 ± 0.23) to ensure generalizability.
    Enhanced the model with interaction terms (e.g., Adult Mortality × HIV/AIDS, Income × Schooling) to capture joint effects.

2

Image
Image

The Result

Results & Findings

Model

MAE

RMSE

Adjusted R²

Remarks

Linear Regression

1.40

2.29

0.745

Baseline, strong linear fit

Lasso Regression

1.60

2.29

0.74

Sparse but slightly less stable

Ridge Regression

5.00

7.07

Low

Over-regularized

Elastic Net (Best)

0.50

0.71

0.82

Best balance & lowest error

✅ Key Predictors:
Adult Mortality ↓ | Infant Deaths ↓ | Schooling ↑ | Healthcare Expenditure ↑ | HIV/AIDS ↓ | BMI ↑ | Immunization (P / D) ↑

🏁 Conclusion:
The Elastic Net Regression model achieved the most accurate and stable predictions (MAE ≈ 0.5, RMSE ≈ 0.71), effectively balancing bias and variance.
Findings highlight education, healthcare spending, and infectious disease control as the strongest determinants of life expectancy—reinforcing the value of data-driven policy for global health improvement.

3

Image
Image

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner

Let's Connect

Let's Work Together

Project Collaboration

Projects in Generative AI, ML and Imaging using advanced computational methods

Mentorship and Guidance

Open to join ongoing publications, supervision, and interdisciplinary projects exploring deep learning and scientific computing

Image banner