Abstract |
The Grid is an emerging infrastructure that supports the discovery, access and use of distributed computational resources. Grids abstract over platform or protocol-specific mechanisms for authentication, file access, data transfer, application invocation, etc. and allow dynamic deployment of applications on diverse hardware and software platforms. The scheduling of computations and the management of resources for Grid-aware applications is a challenging problem as resources are distributed, heterogeneous in nature, owned by different individuals or organizations with their own policies, have different access and cost models and dynamically varying loads and availability. A high-performance scheduler promotes the performance of individual applications by optimizing performance measurements such as minimal execution time. The strategy of efficient and optimized query execution is a challenging research problem. Besides that, the a-priori resource allocation and management is particularly hard, as well. It is important for the researchers and the distributed database designers to know in advance which and how many resources of the Grid architecture are involved in the execution of a given query. The selection of the proper query plan depends on factors such as, communication and computation costs. This work explores this aspect of service-based computing and resource management. We study the various data replication policies that could be followed in distributed database systems. In addition, we focus on how we can optimize query processing over computational Grids and how we can make resource allocation more efficient and effective. Especially, regarding the case in which no data replication takes place, we designed and implemented a high-performance application scheduler for relational join queries over a Grid-aware architecture. We transform given join expressions into directed acyclic graphs (DAGs) that contain all possible plans for the execution of the join. For that purpose, we developed the Query Plan Graph Constructor (QuPGC) algorithm. When the query plan graph is constructed, we select the execution plan that yields optimal performance. For that reason, we developed the Heuristic Query Path Selector (HQuPaS) algorithm, that uses two heuristic functions for the communication and the computation cost of each plan of the graph. The output will be a query execution plan that corresponds with optimal computation and communication cost.
|