E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Post-graduate theses

Post-graduate theses

Current Record: 56 of 824

[Add to Basket]

Identifier

000449722

Title

Stream communication across RISC-V coherence islands, with read-invalidate and write-through-combine cache policies

Alternative Title

Επικοινωνία ροών μεταξύ νησιών συνοχής RISC-V, με πολιτικές κρυφής μνήμης ανάγνωσης-ακύρωσης και εγγραφής-δια μέσου-συνδυασμού

Author

Μουσούρος, Ορέστης Δ.

Thesis advisor

Κατεβαίνης, Μανόλης

Reviewer

Πρατικάκης, Πολύβιος
Παπαευσταθίου, Βασίλειος

Abstract

In the last decades, technology has reached a point of slow scaling, mainly due to limitations caused by the increasing amounts of power consumption. To gain performance speedup, hardware architects have turned to energy efficient processors, including some that are based on open-source RISC-V Instruction Set Architecture (ISA), which promise energy efficiency and high performance on multi-core chips. This thesis contributes the design and implementation of a new approach for interprocessor stream communication across RISC-V Coherence Islands. Traditionally, the coherence islands use memory-to-memory communication over TCP/IP or Remote Direct Memory Access (RDMA) interconnections. Writing and reading data to and from memory at the endpoints heightens latency and depletes processor cycles. Instead, in our work, the communication confines itself between a core and another (remote) node, which can either be a core or a memory. In particular, we propose a new Streaming Cache that resides next to Level 1 Cache (L1 Cache) and uses the same fast interface for communication with the core. We split the Streaming Cache into two logical parts: a) the producer, an outgoing streaming cache that handles streaming data departing from the node; b) the consumer, an incoming streaming cache that handles streaming data arriving to the node. Effectively, in the proposed streaming framework, instead of moving data across the main memory of the end-points, data of both the producer and the consumer can be accessed with same latency as the L1 Cache. To improve performance, we use the read-once/store-once cache policies in the Streaming Cache, which immediately recycle the space of already accessed streaming data. Furthermore, a Prefetcher fetches data from the (remote) node before they are needed, thus reducing the cost of read accesses, while the write accesses take advantage of a Write-Combiner, which combines neighboring data and sends them to the (remote) node. In our work, accesses to streaming data are recognized using virtual addresses without the need of extending ISA. We implemented the proposed system in SystemVerilog, as an extension of the CVA6 (former ARIANE) single-core RISC-V CPU. We built the Incoming and Outgoing schemes of Streaming Cache, each with four (4) contexts (hardware streams) to support virtualization, and we tightly-coupled them with the Load/Store Unit (LSU) of the ARIANE. We also built a communication logic at the edges that sends/receives data over an AXI-4 interconnect. We synthesized our design for Xilinx Zynq UltraScale+ MPSoC Field Programmable Gate Array (FPGA). The Incoming logic of our design utilizes 16839 Look-Up Tables (LUTs), 7506 Registers and 8 Block Random Access Memories (BRAMs), and operates at 275 MHz, while the Outgoing logic utilizes 23606 LUTs, 8615 Registers and 8 BRAMs, and operates at 210 MHz. We performed behavioral simulations to our RTL design in order 1) to verify the streaming functionality when coupled with the RISC-V cores and 2) to evaluate its performance. In our preliminary evaluations, we stream data from/to main memory of the ARIANE core, first using the traditional memory hierarchy and second using our optimized streaming cache. The promising results underline the performance gains due to the stream-optimized cache policies of our design, by managing to almost completely eliminate the latency of network's interconnection in our indicative hand-made bare metal benchmarking programs.

Language

English

Subject

HPC

Hardware

IOT

RDMA

Issue date

2022-07-29

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses