Your browser does not support JavaScript!

Home    Design and implementation of the send part of an Advanced RDMA Engine  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000423010
Title Design and implementation of the send part of an Advanced RDMA Engine
Alternative Title Σχεδίαση και κατασκευή του κομματιού αποστολής, μιας προηγμένης μηχανής Άμεσης Προσπέλασης Μνήμης (RDMA)
Author Ξηρουχάκης, Παντελής Μ.
Thesis advisor Κατεβαίνης, Μανόλης
Reviewer Μπίλας, Άγγελος
Πρατικάκης, Πολύβιος
Abstract In High Performance Computing (HPC), low latency communication between remote processes is crucial to application performance. InfiniBand and other off-the-shelf networks can reduce the latency but require special and costly network interface cards, which are loosely coupled with CPU. In this work, we describe the design and implementation of an advanced RDMA engine developed within the ExaNeSt EU project, which has a number of advantages over Infinibad: i) We segment RDMA transfers in blocks, and support block-level multipathing of RDMA transfers on a per-block basis. ii) We perform selective end-to-end retransmissions. iii) We do not need to pin the regions of RDMA transfers in memory, while at the same time we support accessing the full virtual address space of processes, using ARM SMMU. Additionally, we provide a number of virtual channels able to work simultaneously with many outstanding transfers. Our advanced RMDA engine is designed to support multi-pathing in order to be able to utilize the rich parallel links found in HPC networks. In this work, we describe the hardware implementation of the RDMA engine on the Zynq Ultrascale+. The hardware design has been optimized to meet timing requirements of up to 200Mhz while consuming little resources, leaving plenty of space to be used by i.e accelerators. We have also designed and integrated the interconnect required, as well as the Network Interface (NI) in order to utilize the large Global Virtual Address Space (GVAS) provided by our hardware prototype. We have implemented our advanced RDMA on multiple interconnected FPGAs and have run HPC benchmarks and applications in order to verify and evaluate our design. The results show great improvement over 10G ethernet, as well as our previous RDMA implementations. Finally, our RDMA has been designed to easily accommodate many more features with little to no change, such as congestion management.
Language English
Subject FPGA
HPC
Network
User-level
Zero-copy
Issue date 2019-07-26
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 528

Digital Documents
No preview available

Download document
View document
Views : 24