E-Locus - Institutional Repository of the University of Crete - Design and Implementation of a Directory based Cache Coherence Protocol

Home Design and Implementation of a Directory based Cache Coherence Protocol

Results - Details

[Add to Basket]

Identifier

000368505

Title

Design and Implementation of a Directory based Cache Coherence Protocol

Alternative Title

Σχεδίαση και υλοποίηση ενός πρωτοκόλλου συνέπειας κρυφών μνημών τύπου καταλόγου

Author

Τσαλιαγκός, Δημήτριος Μιχαήλ

Thesis advisor

Κατεβαίνης, Μανώλης

Abstract

As the number of processors per chip increases, so does the need for efficient and high-speed communication support. This is necessary so that applications can exploit the numerous cores available in today chip multiprocessors. Although explicit communication mechanisms such as RDMA can be used, implicit migration of data among the cores significantly simplifies the programming effort in large scale systems, by providing a simple and intuitive programming model. This approach, however, introduces a problem known as cache coherence, where multiple copies of the data need to be kept consistent. An orthogonal solution is to use directory based coherence protocols, which offer increased scalability by reducing the volume of messages exchanged as opposed to broadcast protocols.In this thesis a directory based cache coherence protocol is implemented in a four-core FPGA based prototype that was developed at the CARV (Computer Architecture and VLSI Systems) laboratory of FORTH (Foundation of Research and Technology). The protocol that was implemented can support up to 16 processors and it is integrated with the existing system which also provide RDMA and special hardware support for synchronization and explicit management of cache memories. Finally, our main finding is that the area overhead of the coherent system as opposed to a non-coherent is only 4% in terms of logic. We evaluate our protocol using custom software micro-benchmarks emulating common operations found in parallel applications such as locks and barriers. Also a matrix multiplication algorithm and a producer-consumer benchmark was developed for evaluating the protocol. Our results show that our design scales for the matrix multiplication algorithm, achieving a speedup that ranges between 3.74 to 1.96.

Language

English

Subject

Cache Coherence

Caches

Directory Protocols

Multiprocessors

Κατάλογοι συνέπεια μνήμης

Πολυπύρηνοι επεξεργαστές

Συνέπεια κρυφών μνημών

Issue date

2011-07-15