Your browser does not support JavaScript!

Home    Collections    Type of Work  

Type of Work

Technical reports [13] Doctoral theses [2088]
Graduate theses [1312] Various [75]
Post-graduate theses [4961] Publications [2]

Search command : Author="Στεφανίδης"  And Author="Κωνσταντίνος"

Current Record: 2 of 78

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000434155
Title Implementing convolutional neural networks in a cluster of interconnected FPGAs using Vivado HLS
Alternative Title Υλοποίηση συνελικτικών νευρωνικών δικτύων σε ομάδες διασυνδεδεμένων FPGAs χρησιμοποιώντας το VivadoHLS
Author Μαγειρόπουλος, Ευάγγελος Ε.
Thesis advisor Κατεβαίνης, Εμμανουήλ
Reviewer Δόλλας, Απόστολος
Πρατικάκης, Πολύβιος
Abstract Convolutional Neural Networks (CNNs) are extensively used to augment our everyday experience of the world by automatically labeling and categorizing digital data, such as images, voice records, and video, thus helping in web search and in comprehension of data available in the digitaql world. In this thesis, we explore the possibility to map complete CNNs in clusters of multiple interconnected computing devices, such as ASICs or FPGAs. We seek to define a scalable architecture where a cluster of FPGAs (a segment of the ExaNeSt-based HPC prototype) works concurrently on user streams of inference requests. We base our work on existing tools that simplify the mapping of arbitrary networks, such as using Keras to define the Convolutional Neural Network and hls4ml, a tool developed at CERN, that implements a convolutional neural network into RTL using the Vivado High Level Synthesis (HLS) framework. We introduce a number of code and directive-based optimizations in order to achieve speedups in excess of 700x for the individual kernels, and manage to map all parameters inside FPGA BRAMs. Furthermore, we split and redesign the network in order to minimize the data transfers and balance the work across FPGAs. Finally, we design custom RTL blocks which we integrate with HLS directives in order to use an HPC network for inter-FPGA communication. Our final implementation of the SqueezeNet CNN network which needs 800 million operations per inference task, in 5 FPGAs, offers a throughput of 303 image classifications per second (CPS), and a total inference latency of 24 ms, one order of magnitude smaller than typical user Service Level Aggreements (SLAs).
Language English
Subject Squeezenet
Συνελικτικά νευρωνικά δίκτυα
Issue date 2020-11-27
Collection   Faculty/Department--Faculty of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work
Permanent Link Bookmark and Share
Views 5

Digital Documents
No preview available

No permission to view document.
It won't be available until: 2020-11-26