Date of Award

Winter 2-2022

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computational Analysis and Modeling

First Advisor

Chokchai (Box) Leangsuksun


During the past decades, High-Performance Computing (HPC) has been widely used in various industries. In particular, the exponential growth of GPU (graphics processing unit) is a key technology that has helped promoting the development of artificial intelligence in real-world use cases. When we use GPU to accelerate parallel applications, its programmability, resource management, and scheduling are non-trivial jobs to obtain optimized performance. Therefore, how to effectively exploit GPU resources and improve program performance has been a hot research topic recently.

Benchmark does not always provide a good picture of the performance and details of the parallel applications. The various kinds of hardware devices and the constantly updated parallel programs make the performance analysis and modeling even more difficult.

In this dissertation, there are four main contributions. First, we conduct a study on the GPU analytical performance model, which aims to estimate the suitable number of threads per block for performance improvement.

Second, a novel method to elevate the limitation of GPU is proposed. This method offers a new way for optimization on GPU performance at the block schedule level.

Third, we propose two parallel computing abstract models, namely, the computational and programming models that represent various computing paradigms based on Flynn’s taxonomy and simplify the workload distribution characteristics. This framework provides a general way to create an analytical performance model.

Finally, we validate our proposed abstract models and demonstrate their usefulness with real-world applications in AI (Artificial Intelligence) on a distributed GPU system. The analytical performance model for CNN (Convolutional Neural Network) application analyzes performance characteristics on multiple GPUs, enabling users to evaluate their techniques before running applications on targeted machines.