Date of Award

Spring 2003

Document Type

Dissertation

Degree Name

Doctor of Business Administration (DBA)

Department

Marketing and Analysis

First Advisor

James J. Cochran

Abstract

In this study, I investigate and conduct an experiment on two-stage clustering procedures, hybrid models in simulated environments where conditions such as collinearity problems and cluster structures are controlled, and in real-life problems where conditions are not controlled. The first hybrid model (NK) is an integration between a neural network (NN) and the k-means algorithm (KM) where NN screens seeds and passes them to KM. The second hybrid (GK) uses a genetic algorithm (GA) instead of the neural network. Both NN and GA used in this study are in their simplest-possible forms.

In the simulated data sets, I investigate two properties: clustering performance comparisons and effects of five factors (scale, sample size, density, number of clusters, and number of variables) on the five clustering approaches (KM, NN, NK, GA, GK). Density, number of clusters, and dimension influence the clustering performance of all five approaches. KM, NK, and GK classify well when all clusters contain a similar number of observations, while NK and GK perform better than the KM. NN performs well when one cluster contains more observations than any other cluster. The two hybrid models perform at least as well as KM, although the environments are in favor of the KM. The most crucial information, the true number of clusters, is provided to the KM only. In addition, the cluster structures are simple: the clusters are well separated; the variances and cluster sizes are uniform; the correlation between any pair of variables and collinearity problems are not significant; and the observations are normally distributed.

Real-life problems consist of three problems with a known natural cluster structure and one problem with an unknown natural cluster structure. Overall results indicate that GK performs better than KM, while NK is the worst performing among the five approaches. The two machine learning approaches generate better results than KM in an environment that does not favor KM.

GK has shown to be the best or among the best in a simulated environment and in real-life situations. Furthermore, the GK can detect firms with promising financial prospect such as acquisition targets and firms with “buy” recommendation, better than all other approaches.

Share

COinS