E-Locus - Institutional Repository of the University of Crete - Σχεδίαση και Υλοποίηση ενός Υποσυστήματος Μνήμης Συνοχής Δεδομένων για Πολυεπεξεργαστές Διαμοιραζόμενης Μνήμης

Home Σχεδίαση και Υλοποίηση ενός Υποσυστήματος Μνήμης Συνοχής Δεδομένων για Πολυεπεξεργαστές Διαμοιραζόμενης Μνήμης

Results - Details

[Add to Basket]

Identifier

uch.csd.msc//2006vlachos

Title

Σχεδίαση και Υλοποίηση ενός Υποσυστήματος Μνήμης Συνοχής Δεδομένων για Πολυεπεξεργαστές Διαμοιραζόμενης Μνήμης

Alternative Title

Design and Implementation of a Coherent Memory Sub-System for Shared Memory Multiprocessors

Creator

Vlachos, Evaggelos

Abstract

Recent technology advances in integrated electronics offer the ability to add more and more transistors into modern chips. Chip Multiprocessors (CMPs) are architectures that feature multiple processing cores on a single chip. They result in higher processing power, easier design scalability, and greater performance/power ratio. CMPs appear to be one of the dominating architectural approaches for the years to come in the area of high performance architectures. The purpose of this work is to design and implement a shared memory multi-core system that matches the needs of future CMPs. Specifically, an FPGA-based prototype has been implemented, which constitutes a two-node processing system. The design takes advantage of the two PowerPC cores that are embedded in the FPGA fabric. We have implemented external coherent caches equipped with a MESI protocol, and a bus-based coherent memory interconnect to connect the two processors. Shared memory resides in external DDR memory accessible through the interconnect and the DDR controller. We find that the area overhead of our coherent memory system is 33.4% of a medium-size FPGA. We evaluate the performance of the system by using both simulations and custom software benchmarks running on the two processors. Our simulations show that the system implemented is more efficient than systems based exclusively by Xilinx soft-cores that offer the same type of memory coherence. Our custom benchmarks simulate basic operations found commonly in parallel programs. Our results show that our design scales well with respect to a single processor, for the Merge-Sort algorithm and the Producer-Consumer benchmark that dont require a great amount of synchronization traffic. The speedup measured ranges between 1.89 to 1.92 and 1.89 to 3.45, respectively. On the other hand, the Shared-Counter benchmark slows down by 3 to 10 times due to excessive synchronization traffic.

Issue date

2006-09-01

Date available

2006-11-23

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

451

Digital Documents
	Download document View document Views : 7