Your browser does not support JavaScript!

Doctoral theses

Current Record: 65 of 121

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000383532
Title Architectural support for software-guided energy reduction of manycore communication
Alternative Title Αρχιτεκτονική υποστήριξη για μείωση κατανάλωσης ενέργειας στην επικοινωνία πολυπύρηνων επεξεργαστών υπό την καθοδήγηση λογισμικού
Author Παπαευσταθίου, Βασίλειος
Thesis advisor Κατεβαίνης, Μανόλης
Reviewer Μπίλας, Άγγελος
Νικολόπουλος, Δημήτριος
Abstract At the beginning of the 21st century, the processor industry made a fundamental shift towards multicore architectures, in order to address the diminishing returns in single-thread performance with increasing transistor counts, and in order to overcome the severe power problems of clock frequency scaling. Semiconductor technology trends indicate that now the era of power- and energy-constrained manycore architectures has come. Technology projections show that the energy consumed by data movement and communication will dominate the corresponding budget of future computing systems; thus, unnecessary data movements will subtract significant energy margin from computations. The most popular communication model for multi-core and many-core architectures is shared-memory. Threads or processes that run concurrently on different cores communicate and exchange data by accessing the same global memory locations. However, accesses to off-chip memory are slow and, thus, processor designs utilize a hierarchy of faster on-chip memories to improve the speed of memory operations. Memory hierarchies today are based on two dominant schemes: (i) multilevel coherent caches, and (ii) software-managed local memories (scratchpads). Caches manage the memory hierarchy transparently, using hardware replacement policies, and communication happens implicitly, with cache-coherence protocols that provoke data transfers between caches. Scratchpad memories are controlled by the programmer or the runtime software, and communication happens explicitly, through programmable DMA engines that perform the data transfers. This thesis proposes architectural support in the memory hierarchy to enable the software to control data locality; we design programmable hardware primitives that allow runtime software to orchestrate communication and reduce the associated energy consumption. We demonstrate a hybrid cache/scratchpad memory hierarchy that provides unified hardware support for both implicit communication, via cache-coherence, and explicit communication, via fast virtualized inter-processor communication hardware primitives. We also introduce the Epoch-based Cache Management (ECM), which allows software to assign priorities to cache-lines, in order to guide the cache replacement policy, and, in effect, to manage locality. Moreover, we design the Explicit Bulk Prefetcher (EBP), a programmable prefetch engine that allows software to accurately prefetch data ahead of time, in order to hide memory latency and improve cache locality. Furthermore, we propose a set of hardware primitives for Software Guided Coherence (SGC) in non-cache-coherent systems, in order to allow runtime software to orchestrate the fetching of the most up-todate version of data from the appropriate cache(s) and maintain coherence at the software object granularity. We evaluate our proposed hardware primitives by comparing them against directory-based cache-coherence with hardware prefetching. Our experimental results for explicit communication show that we can improve performance by 10% to 40%, and at the same time reduce the energy consumption of on-chip communication by 35% to 70% owing to significant reduction in on-chip traffic, by factors of 2 to 4. Moreover, we exploit a task-based programming system to guide hardware, and show that our proposed hardware primitives in cache-coherent systems (ECM, EBP) improve performance by an average of 20%, inject 25% less on-chip traffic on average, and reduce the energy consumption in the components of the memory hierarchy by an average of 28%. Our hardware support for non-cache-coherent systems (ECM, SGC) improves performance by an average of 14%, injects 41% less on-chip traffic on average, and reduces the energy consumption in the components of the memory hierarchy by an average of 44%.
Language English
Subject Cache coherence
Data movement
Energy consumption
Manycore processors
Memory hierarchies
Runtime software
Ιεραρχίες μνήμης
Κατανάλωση ενέργειας
Λογισμικό χρόνου εκτέλεσης
Μετακίνηση δεδομένων
Πολυπύρηνοι επεξεργαστές
Συνοχή κρυφών μνημών
Issue date 2014-03-06
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses
  Type of Work--Doctoral theses
Permanent Link https://elocus.lib.uoc.gr//dlib/9/f/5/metadata-dlib-1396952781-544042-21263.tkl Bookmark and Share
Views 574

Digital Documents
No preview available

Download document
View document
Views : 36