Date of Award

Spring 5-24-2025

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

William Bradley Glisson

Abstract

The common factor with current implementations of Artificial Intelligence (AI) is data. Companies are constantly looking for new ways to analyze data, but it comes in various formats: Text, Comma Separated Value (CVE), JavaScript Object Notation (JSON), Extensible Markup Language (XML), and Excel. How can AI be adapted to standardize formats for data analysis, integration, and digestion efficiently? Published research acknowledged that Machine Learning (ML) and AI can provide an automated method to speed up this process and limit the human decision-making error. With the advancement in AI, Application Programming Interfaces (APIs) prompt the idea that they can take in raw data as input into an AI platform and standardize the data for Large Language Model (LLM) algorithms. This research aims to standardize raw data using AI APIs, gather Open-Source Intelligence (OSINT) data to augment the raw data, and utilize the standardized data and OSINT for LLM ingest. Research results show that AI APIs can be used to standardize raw data by including various methods to reach this goal. This research demonstrated OSINT techniques to gather data and utilizing it for LLM algorithms. A LLM with ingesting a standardized dataset and OSINT data can be presented with specific questions to generate a response. In this research, the results depict a list of specific cyber-attacks that could be viable based off of the ingested data. Access to this information, when combined with knowledge of AI APIs and OSINT, provides an opportunity to create a process for standardizing raw data and exploiting the intelligence of LLMs.

Share

COinS