Forecasting Next Year's Health Insurance Claims Using Machine Learning Models
Main Article Content
Abstract
This study explores the transformative potential of big data analytics in the realm of health insurance risk management. Focusing on data sourced from Highmark Health from 2015 to 2018, the research aims to evaluate the efficacy of advanced data manipulation techniques and machine learning models in enhancing predictive accuracy. The analysis involves a comprehensive examination of Health Maintenance Organization (HMO) and Preferred Provider Organization (PPO) plans, with rigorous data preparation processes such as cleaning, aggregation, feature engineering, and outlier handling to ensure model suitability. Four distinct models were developed: an initial model utilizing raw data without outlier treatment, a model post-outlier treatment considering both HMO and PPO members, and models focusing exclusively on HMO and PPO members respectively. Results demonstrated significant improvements in predictive accuracy following outlier treatment, with Random Forest and Multivariate Adaptive Regression Splines showing superior performance. The Random Forest model achieved a Root Mean Square Error (RMSE) of 630.04 and an R-squared value of 0.757, underscoring its robust predictive capabilities. Similarly, the Multivariate Adaptive Regression Splines model exhibited strong fit with commendable metrics. The HMO-focused model yielded promising outcomes with a minimal RMSE of 675.85 and an R-squared value of 0.68. However, the PPO-focused model's suboptimal results highlight potential data quality issues and dataset limitations. This research underscores the critical role of integrating machine learning techniques in health insurance analytics, providing valuable insights for proactive risk management and decision-making, and enhancing efficiency and effectiveness within the industry,
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
[2] Blier-Wong, C., Cossette, H., Lamontagne, L., & Marceau, E. (2020). Machine learning in P&C insurance: A review for pricing and reserving. Risks, 9(1), 4.
[3] Boobier, T. (2016). Analytics for insurance: The real business of big data. John Wiley & Sons.
[4] Dorfman, M. S. (1998). Introduction to risk management and insurance (6th ed.). Prentice Hall.
[5] Gupta, R., Mudigonda, S., Kandala, P., & Baruah, P. K. (2019). A framework for comprehensive fraud management using actuarial techniques. International Journal of Scientific and Engineering Research, 10, 780–791.
[6] Maynard, T., Bordon, A., Berry, J. B., Baxter, D. B., Skertic, W., Gotch, B. T., Shah, N. T., Wilkinson, A. N., Khare, S. H., & Jones, K. B. (2019). What role for AI in insurance pricing. A Preprint.
[7] Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
[8] National Association of Insurance Commissioners (NAIC). (2024). Health insurance. Retrieved March 5, 2024, from https://content.naic.org/consumer/health-insurance.htm
[9] Paruchuri, H. (2020). The impact of machine learning on the future of insurance industry. American Journal of Trade and Policy, 7(3), 85–90.
[10] Rai, N., Baruah, P. K., Mudigonda, S. S., & Kandala, P. K. (2018). Fraud detection supervised machine learning models for an automobile insurance. International Journal of Scientific and Engineering Research, 9(11), 473–479.
[11] Rawat, S., Rawat, A., Kumar, D., & Sabitha, A. S. (2021). Application of machine learning and data visualization techniques for decision support in the insurance sector. International Journal of Information Management Data Insights, 1(2), 100012.
[12] Rejda, G. E. (2005). Principles of risk management and insurance. Pearson Education India.
[13] Senousy, Y. M. B., Mohamed, N. E.-K., & Riad, A. (2018). Recent trends in big data analytics towards more enhanced insurance business models. International Journal of Computer Science and Information Security, 30111817, 39–45.
[14] Severino, M. K., & Peng, Y. (2021). Machine learning algorithms for fraud prediction in property insurance: Empirical evidence using real-world microdata. Machine Learning with Applications, 5, 100074.
[15] C. Chang Yu, I. R. A Hamid, Z. Abdullah, K. Kipli, and H. Amnur, “A Multi-tier Model and Filtering Approach to Detect Fake News Using Machine Learning Algorithms,” JOIV Int. J. Inform. Vis., vol. 8, no. 2, p. 643, May 2024, doi: 10.62527/joiv.8.2.2703.
[16] Taha, A., Cosgrave, B., & Mckeever, S. (2022). Using feature selection with machine learning for generation of insurance insights. Applied Sciences, 12(6), 3209.
[17] Wu, J., Qiao, J., Nicholas, S., Liu, Y., & Maitland, E. (2022). The challenge of healthcare big data to China’s commercial health insurance industry: Evaluation and recommendations. BMC Health Services Research, 22(1), 1189