Date of Award
Spring 2012
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computational Analysis and Modeling
First Advisor
Mihaela Paun
Abstract
This dissertation introduces a new metric in the area of High Performance Computing (HPC) application reliability and performance modeling. Derived via the time-dependent implementation of an existing inequality measure, the Failure index (FI) generates a coefficient representing the level of volatility for the failures incurred by an application running on a given HPC system in a given time interval. This coefficient presents a normalized cross-system representation of the failure volatility of applications running on failure-rich HPC platforms. Further, the origin and ramifications of application failures are investigated, from which certain mathematical conclusions yield greater insight into the behavior of these applications in failure-rich system environments.
This work also includes background information on the problems facing HPC applications at the highest scale, the lack of standardized application-specific metrics within this arena, and a means of generating such metrics in a low latency manner. A case study containing detailed analysis showcasing the benefits of the FI is also included.
Recommended Citation
Chandler, Clayton F., "" (2012). Dissertation. 373.
https://digitalcommons.latech.edu/dissertations/373