Author

Sangam Mulmi

Date of Award

Spring 5-2020

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational Analysis and Modeling

First Advisor

Sumeet Dua

Abstract

Intrusion Detection Systems (IDSs) monitor network traffic and system activities to identify any unauthorized or malicious behaviors. These systems usually leverage the principles of data science and machine learning to detect any deviations from normalcy by learning from the data associated with normal and abnormal patterns. The IDSs continue to suffer from issues like distributed high-dimensional data, inadequate robustness, slow detection, and high false-positive rates (FPRs). We investigate these challenges, determine suitable strategies, and propose relevant solutions based on the appropriate mathematical and computational concepts.

To handle high-dimensional data in a distributed network, we optimize the feature space in a distributed manner using the PCA-based feature extraction method. The experimental results display that the classifiers built upon the features so extracted perform well by giving a similar level of accuracy as given by the ones that use the centrally extracted features. This method also significantly reduces the cumulative time needed for extraction. By utilizing the extracted features, we construct a distributed probabilistic classifier based on Naïve Bayes. Each node counts the local frequencies and passes those on to the central coordinator. The central coordinator accumulates the local frequencies and computes the global frequencies, which are used by the nodes to compute the required prior probabilities to perform classifications. Each node, being evenly trained, is capable of detecting intrusions individually to improve the overall robustness of the system.

We also propose a similarity measure-based classification (SMC) technique that works by computing the cosine similarities between the class-specific frequential weights of the values in an observed instance and the average frequency-based data centroid. An instance is classified into the class whose weights for the values in it share the highest level of similarity with the centroid. SMC contributes alongside Naïve Bayes in a multi-model classification approach, which we introduce to reduce the FPR and improve the detection accuracy. This approach utilizes the similarities associated with each class label determined by SMC and the probabilities associated with each class label determined by Naïve Bayes. The similarities and probabilities are aggregated, separately, to form new features that are used to train and validate a tertiary classifier. We demonstrate that such a multi-model approach can attain a higher level of accuracy compared with the single-model classification techniques.

The contributions made by this dissertation to enhance the scalability, robustness, and accuracy can help improve the efficacy of IDSs.

Share

COinS