E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Doctoral theses

Doctoral theses

Current Record: 69 of 125

[Add to Basket]

Identifier

000383532

Title

Architectural support for software-guided energy reduction of manycore communication

Alternative Title

Αρχιτεκτονική υποστήριξη για μείωση κατανάλωσης ενέργειας στην επικοινωνία πολυπύρηνων επεξεργαστών υπό την καθοδήγηση λογισμικού

Author

Παπαευσταθίου, Βασίλειος

Thesis advisor

Κατεβαίνης, Μανόλης

Reviewer

Μπίλας, Άγγελος
Νικολόπουλος, Δημήτριος

Abstract

At the beginning of the 21st century, the processor industry made a fundamental shift towards multicore architectures, in order to address the diminishing returns in single-thread performance with increasing transistor counts, and in order to overcome the severe power problems of clock frequency scaling. Semiconductor technology trends indicate that now the era of power- and energy-constrained manycore architectures has come. Technology projections show that the energy consumed by data movement and communication will dominate the corresponding budget of future computing systems; thus, unnecessary data movements will subtract significant energy margin from computations. The most popular communication model for multi-core and many-core architectures is shared-memory. Threads or processes that run concurrently on different cores communicate and exchange data by accessing the same global memory locations. However, accesses to off-chip memory are slow and, thus, processor designs utilize a hierarchy of faster on-chip memories to improve the speed of memory operations. Memory hierarchies today are based on two dominant schemes: (i) multilevel coherent caches, and (ii) software-managed local memories (scratchpads). Caches manage the memory hierarchy transparently, using hardware replacement policies, and communication happens implicitly, with cache-coherence protocols that provoke data transfers between caches. Scratchpad memories are controlled by the programmer or the runtime software, and communication happens explicitly, through programmable DMA engines that perform the data transfers. This thesis proposes architectural support in the memory hierarchy to enable the software to control data locality; we design programmable hardware primitives that allow runtime software to orchestrate communication and reduce the associated energy consumption. We demonstrate a hybrid cache/scratchpad memory hierarchy that provides unified hardware support for both implicit communication, via cache-coherence, and explicit communication, via fast virtualized inter-processor communication hardware primitives. We also introduce the Epoch-based Cache Management (ECM), which allows software to assign priorities to cache-lines, in order to guide the cache replacement policy, and, in effect, to manage locality. Moreover, we design the Explicit Bulk Prefetcher (EBP), a programmable prefetch engine that allows software to accurately prefetch data ahead of time, in order to hide memory latency and improve cache locality. Furthermore, we propose a set of hardware primitives for Software Guided Coherence (SGC) in non-cache-coherent systems, in order to allow runtime software to orchestrate the fetching of the most up-todate version of data from the appropriate cache(s) and maintain coherence at the software object granularity. We evaluate our proposed hardware primitives by comparing them against directory-based cache-coherence with hardware prefetching. Our experimental results for explicit communication show that we can improve performance by 10% to 40%, and at the same time reduce the energy consumption of on-chip communication by 35% to 70% owing to significant reduction in on-chip traffic, by factors of 2 to 4. Moreover, we exploit a task-based programming system to guide hardware, and show that our proposed hardware primitives in cache-coherent systems (ECM, EBP) improve performance by an average of 20%, inject 25% less on-chip traffic on average, and reduce the energy consumption in the components of the memory hierarchy by an average of 28%. Our hardware support for non-cache-coherent systems (ECM, SGC) improves performance by an average of 14%, injects 41% less on-chip traffic on average, and reduces the energy consumption in the components of the memory hierarchy by an average of 44%.

Language

English

Subject

Cache coherence

Data movement

Energy consumption

Manycore processors

Memory hierarchies

Runtime software

Ιεραρχίες μνήμης

Κατανάλωση ενέργειας

Λογισμικό χρόνου εκτέλεσης