Abstract

For most distributed computing systems (DCS), distributed system reliability (DSR) and the completion time of an application are the two most important requirements. To meet these requirements, it is essential that appropriate algorithms are developed for proper program and file allocation and scheduling. This dissertation focuses on the development of algorithms to maximize DSR and/or minimize the completion time based on more practical DCS models. In almost all current reliability-oriented allocation models program and file allocation has been considered separately, rather than simultaneously. In this study a reliability–oriented allocation model was proposed, which considered the program and file allocation together so as to obtain the highest possible DSR. Certain constraints were also taken into account to make the model more practical. The model is very comprehensive and can be reduced to some other existing models under certain conditions. To solve the NP-hard problem of simultaneous program and file allocation formulated herein, a Genetic Algorithm (GA) was proposed. To gauge the suitability of Tabu Search (TS) and GA for solving this problem, a TS was proposed and the results of TS were compared with those of GA. GA and TS were both found to be capable of finding the optimal solutions in most cases when the solution space was small. However TS outperformed GA with shorter computing time and better solution quality for both small and large solution space. Further improvements in performance over that of the TS were obtained by using a parallel TS (PTS). Simulation results showed that the solution quality did not change significantly with increased number of processors whereas the speedup of the PTS basically grew linearly when the number of processor was not very large. Extensive algorithms have been proposed for the NP-hard problem of scheduling a parallel program to a DCS with the objective of minimizing the completion time of the program. Most of these, however, assumed that the DCS was homogeneous. An iterative list algorithm was proposed in this dissertation to solve the scheduling problem for the more difficult heterogeneous computing systems. Simulation results showed that the proposed algorithm outperformed most existed scheduling algorithms for heterogeneous computing in terms of the completion time of the application. To consider DSR and completion time simultaneously, a multi-objective optimization problem was formulated and a Tabu Search algorithm proposed to solve the problem. Two “lateral interference” schemes were adopted to distribute the Pareto optimal solutions along the Pareto-front uniformly. Simulation results showed that “lateral interference” could improve the “uniform distribution of non-dominated solutions” and was not sensitive to the different computation schemes of distances between the solutions. In addition, a general centralized heterogeneous distributed system model was formulated and a solution algorithm developed to compute the distributed service reliability.