Your browser does not support JavaScript!

Home    Σχεδίαση και Υλοποίηση ενός Υποσυστήματος Μνήμης Συνοχής Δεδομένων για Πολυεπεξεργαστές Διαμοιραζόμενης Μνήμης  

Results - Details

Add to Basket
[Add to Basket]
Identifier uch.csd.msc//2006vlachos
Title Σχεδίαση και Υλοποίηση ενός Υποσυστήματος Μνήμης Συνοχής Δεδομένων για Πολυεπεξεργαστές Διαμοιραζόμενης Μνήμης
Alternative Title Design and Implementation of a Coherent Memory Sub-System for Shared Memory Multiprocessors
Creator Vlachos, Evaggelos
Abstract Recent technology advances in integrated electronics offer the ability to add more and more transistors into modern chips. Chip Multiprocessors (CMPs) are architectures that feature multiple processing cores on a single chip. They result in higher processing power, easier design scalability, and greater performance/power ratio. CMPs appear to be one of the dominating architectural approaches for the years to come in the area of high performance architectures. The purpose of this work is to design and implement a shared memory multi-core system that matches the needs of future CMPs. Specifically, an FPGA-based prototype has been implemented, which constitutes a two-node processing system. The design takes advantage of the two PowerPC cores that are embedded in the FPGA fabric. We have implemented external coherent caches equipped with a MESI protocol, and a bus-based coherent memory interconnect to connect the two processors. Shared memory resides in external DDR memory accessible through the interconnect and the DDR controller. We find that the area overhead of our coherent memory system is 33.4% of a medium-size FPGA. We evaluate the performance of the system by using both simulations and custom software benchmarks running on the two processors. Our simulations show that the system implemented is more efficient than systems based exclusively by Xilinx soft-cores that offer the same type of memory coherence. Our custom benchmarks simulate basic operations found commonly in parallel programs. Our results show that our design scales well with respect to a single processor, for the Merge-Sort algorithm and the Producer-Consumer benchmark that don”t require a great amount of synchronization traffic. The speedup measured ranges between 1.89 to 1.92 and 1.89 to 3.45, respectively. On the other hand, the Shared-Counter benchmark slows down by 3 to 10 times due to excessive synchronization traffic.
Issue date 2006-09-01
Date available 2006-11-23
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 451

Digital Documents
No preview available

Download document
View document
Views : 7