Abstract |
Increasing the number of cores in modern CPUs is emerging as the main approach for improving system performance. A central challenge in this area is the runtime support that multi-core systems ought to use for sustaining high performance and scalability without, however, increasing disproportionally the effort required by the programmer. In this work we present Tagged Procedure Calls
(TPC), a runtime system for supporting task-based programming models on
architectures that require explicit data access specification by the programmer, such as the Cell processor. We present the design and
implementation of TPC for the Cell and we examine how the runtime system
can support task management functions with on-chip communication only, without requiring accesses to off-chip memory. Through minimizing off-chip transactions in the runtime, we achieve sub-microsecond task initiation latency ---which represents an order of magnitude of improvement over existing task-parallel programming frameworks on the Cell-- and minimum null task initiation/completion latency of 385 ns. We evaluate TPC with several
kernels and applications, demonstrating that TPC achieves scalable on-chip
execution of codes previously parallelized and optimized for shared-memory multiprocessors, can exploit additional fine-grain parallelism in codes previously parallelized at coarse levels of granularity, and performs
competitively to existing task-based parallel programming frameworks that
statically optimize data layout and task placement.
|