Date of Award

Spring 2005

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational Analysis and Modeling

First Advisor

Vir Phoha

Abstract

The objective of this dissertation research is to aid the Web user to achieve his search objective at a host Web site by organizing a strongly connected neighborhood of Web pages that are thematically and spatially related to the user's search interest. Therefore, methods were developed to (1) find all Web pages at a given Web site that are thematically similar to a user's initial choice of a Web page (selected from the set of Web pages returned in response to a query by any popular search engine), and (2) organize these pages hierarchically in terms of their relevance to the user's initial Web page request. This selection and organization of pages is dynamically adjusted in order to make these methods responsive to the user's choice of pages defining his search agenda.

The methods developed in this work skillfully incorporate the production of the bipartite clique graph structure to simulate both spatial and thematic relatedness of Web pages. By ranking the user's initial page choice as the most relevant page, the authority page, link analysis is used to identify a set of pages with out-links to this authority page and assemble these into a hub of relevant pages. The authority set (initially containing only the user's initial page choice) is then expanded to include other pages with in-links from the set of hub pages. The authority-hub relationship signified by Web page links is used to define the two partite sets of the biclique graph. The partite set of authority pages contains the user's initial page choice and other thematically and spatially similar pages. The partite set of hub pages contains pages whose out-links to the authority pages serve as validation of their thematic relevance to the user's search objective.

Two maximal biclique neighborhoods of Web pages specific to the user's interest, containing eight and five pages respectively, were successfully extracted from Web server access logs containing 47,635 entries and 1,140 distinct request pages. The iterative use of these methods in association with three Web page metrics introduced in this research facilitated extending a neighborhood dynamically to include nine additional relevant pages.

Share

COinS