🌍 Life Expectancy Prediction Using Regression Analysis
📌 Goal:
Predict life expectancy across 193 countries (2000–2015) using regression-based machine learning models to identify the key health, economic, and social factors influencing global longevity.
🧠 Domain: Global Health Analytics & Predictive Modeling
🎯 Task: Regression (Continuous Prediction)
📂 Dataset: WHO Global Health Observatory & UN Data (193 countries × 22 features, 2000–2015)
Project Domain
Data Analysis and Regression
Task
Prediction
The Goal:
Life expectancy is a core indicator of a nation’s health, shaped by the interplay of medical, economic, and social determinants. This project aimed to model life expectancy across countries and time using multivariate regression techniques, evaluating which indicators most strongly influence human longevity. The analysis sought to aid policymakers by quantifying how factors such as education, healthcare expenditure, and disease prevalence affect global health outcomes.
1
The Challenge:
Methodology & Process
Data Compilation:
Combined WHO and UN datasets (193 countries, 16 years) totaling 2,938 entries and 22 attributes.
Missing values (e.g., GDP, Hepatitis B, Schooling) were handled by imputation and country-level exclusion when necessary.Preprocessing:
Outlier detection via Boxplots, Z-Scores, and IQR
Correlation and multicollinearity checks (VIF < 10)
Feature selection using Backward Elimination
Models Evaluated:
Linear Regression
Ridge Regression (L2 regularization)
Lasso Regression (L1 regularization)
Elastic Net Regression (combined L1 + L2)
Model Validation:
Applied 5-fold cross-validation (mean RMSE ≈ 3.71 ± 0.23) to ensure generalizability.
Enhanced the model with interaction terms (e.g., Adult Mortality × HIV/AIDS, Income × Schooling) to capture joint effects.
2
The Result
Results & Findings
Model | MAE | RMSE | Adjusted R² | Remarks |
|---|---|---|---|---|
Linear Regression | 1.40 | 2.29 | 0.745 | Baseline, strong linear fit |
Lasso Regression | 1.60 | 2.29 | 0.74 | Sparse but slightly less stable |
Ridge Regression | 5.00 | 7.07 | Low | Over-regularized |
Elastic Net (Best) | 0.50 | 0.71 | 0.82 | Best balance & lowest error |
✅ Key Predictors:
Adult Mortality ↓ | Infant Deaths ↓ | Schooling ↑ | Healthcare Expenditure ↑ | HIV/AIDS ↓ | BMI ↑ | Immunization (P / D) ↑
🏁 Conclusion:
The Elastic Net Regression model achieved the most accurate and stable predictions (MAE ≈ 0.5, RMSE ≈ 0.71), effectively balancing bias and variance.
Findings highlight education, healthcare spending, and infectious disease control as the strongest determinants of life expectancy—reinforcing the value of data-driven policy for global health improvement.
3








