This project aims to predict whether a health insurance claim will be approved or not, using machine learning techniques and SHAP-based model interpretability.
Insurance companies process thousands of health insurance claims. Identifying potentially fraudulent or rejected claims early can reduce losses and improve operational efficiency.
This project predicts whether a claim will be approved (1) or rejected (0) based on various features like patient age, diagnosis code, claim amount, procedure details, etc.
- File:
enhanced_health_insurance_claims.csv - Columns Used:
ClaimID,ClaimDate,ClaimAmountPatientAge,PatientGender,DiagnosisCode,ProcedureCodeClaimType,ClaimSubmissionMethod, etc.ClaimApproved(Target variable: 1 = Approved, 0 = Rejected)
| Tool | Purpose |
|---|---|
| Python | Core Programming Language |
| Pandas, NumPy | Data Processing |
| Matplotlib, Seaborn | Data Visualization |
| XGBoost | ML Model (Gradient Boosted Trees) |
| Scikit-learn | Train-test split, metrics |
| SHAP | Model Interpretability |
- Load Dataset
- Handle Missing Values
- Convert Dates & Feature Engineering
- Encode Categorical Variables
- Split Dataset (Train/Test)
- Train XGBoost Classifier
- Model Evaluation (Accuracy, Report)
- SHAP Analysis (Explainability)
- Model Used: XGBoost Classifier
- Accuracy: ~
46.33% - Top Features:
ClaimAmount,PatientAge,DiagnosisCode_X, etc. - Explainability: SHAP summary plot included
