Date of Award

Summer 8-24-2024

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Prerna Dua

Abstract

This thesis systematically optimizes and compares state-of-the-art supervised classification models for Louisiana Medicaid data targeting clinical services, COVID-19 infection, and tobacco use. These target variables are critically important as they represent key health outcomes and behaviors among Medicaid enrollees in Louisiana, a population often characterized by poverty and limited access to education. This study applies advanced machine learning techniques to identify the best model for multinomial and binary classification tasks. These include models such as Logistic Regression, XGBoost, AdaBoost, Random Forest, Decision Tree, Artificial Neural Networks, and Naïve Bayes. Extensive tuning of the hyperparameters and optimization of each classifier were utilized to achieve the best possible performance from the classifiers. The results indicate that, out of these models, XGBoost performed the best for accuracy, recall, and F1 score across all target variables; Random Forest performed strongly across the board but especially on binary classification tasks; the simpler Naïve Bayes models were poor while it had some utility in specific cases. It indicates that tree-based ensemble learners are a suitable approach for mixed and massive datasets. The findings underscore the importance of proper feature engineering, exploratory data analysis, the careful choice of hyperparameters, and how ensemble methods can robustly handle a variety of complex datasets. Beyond this, it also establishes the role of machine learning models in improving predictive analytics for better decision-making in healthcare, especially under Medicaid. The results from the study are another contribution to the ever-growing literature on healthcare analytics by discussing in detail machine learning models that are applied to a real-world, large healthcare dataset with organized preprocessing. The thesis would provide valuable experience for academic research and practical applications in health informatics, notably aimed at improving the health outcomes of vulnerable populations in Louisiana.

Share

COinS