E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Doctoral theses

Doctoral theses

Current Record: 100 of 121

[Add to Basket]

Identifier

uch.csd.phd//2007chrysos

Title

Request-Grant Scheduling for Congestion Elimination in Multistage Networks

Alternative Title

Εξάλειψης συμφόρησης σε Πολυεπίπεδα Δίκτυα μέσω Πρωτοκόλλου Αιτήσεων

Author

Χρυσός, Νικόλαος Ι

Thesis advisor

Κατεβαίνης, Μ.

Abstract

This thesis considers buffered multistage interconnection networks (fabrics), and investigates methods to reduce their buffer size requirements. Our contribution is a novel ow and congestion control scheme that achieves performance close to that of per-flow queueing while requiring much less buffer space than what per-flow queues would need. The new scheme utilizes a request-grant preapproval phase, as many contemporary bufferless networks do, but its operation is much simpler and its performance is remarkably better. Traditionally, the role of requests in bufferless networks is to reserve an available time slot on each link along a packet's route, where these time slots are contiguous in time along the path, so as to guarantee non-con icting packet transmission. These requirements impose a very heavy toll on the scheduling unit of such bufferless fabrics. By contrast, our requests do not reserve links for a specific time duration, but instead only reserve space in the buffers at their entry points; effectively, the scheduling decisions that concern different links are decoupled among themselves, leading to a much simpler admission process. The proposed scheduling subsystem comprises independent single-resource schedulers, operating in a pipeline; they operate asynchronously to each other. In this thesis we show that the reservation of buffers in front of critical network links {links that are unable to carry the potential aggregate demand{ eliminates congestion, in the sense that traffic ows seamlessly through the network: it neither gets dropped, nor is excessively blocked waiting for downstream buffers to become available. First, we apply request-grant scheduling to a single-stage switch, with small, shared output queues, which serves as a model for the more challenging multistage case. We demonstrate that, in principle, a very small number of fabric buffers suffices to reach high performance levels: with 12-cell buffer space per output, performance is better than in buffered crossbars, which consume N cells of buffer space per output, where N is the number of ports. In this single-stage setting, we study the impact of input contention on scheduler performance, and the related synchronization phenomena. During this work, we have introduced a novel scheduling scheme for buffered crossbar switches that makes buffer size independent of the round-trip-time between the linecards and the switch. We then proceed to the multistage case. Our main motivation and our primary benchmark is an example next-generation fabric challenge: a 1024x1024, 3-stage, non-blocking Clos/Benes fabric, running with no internal speedup, made of 96 single-chip 32x32 buffered crosssbar switching elements (3 stages of 32 switch chips each). To eliminate congestion in the fabric, we carefully apply our request-grant scheduling protocol. We demonstrate that it is feasible to implement all schedulers centrally, in a single chip. Besides congestion elimination, our scheduler can guarantee 100 percent in-order delivery, using very small reorder buffers, which can easily fit in on-chip memory. Simulation results indicate very good delay performance, and throughput that exceeds 95% under unbalanced traffic. Most prominent is the result that, under hotspot traffic, with almost all output ports being congested, the non-congested outputs experience negligible delay degradation. The proposed system can directly operate on variable-size packets, eliminating the padding overhead and the associated internal speed-up. We also discuss a possible distributed version of the scheduling subsystem. Our scheme is appropriate to deal with heavy congestion; in systems that need to provide very low latency under (uncongested) light traffic, one would apply this scheme when the load exceeds a given threshold. Lastly, we consider some blocking network topologies, like the banyan. In a banyan network, besides output ports, internal links can cause congestion as well. We show a fully distributed scheduler for this network, that eliminates congestion from both internal and output-port links.

Language

English

Issue date

2007-05-01

Date available

2007-10-11

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses

Type of Work--Doctoral theses

Permanent Link

https://elocus.lib.uoc.gr//dlib/9/0/f/metadata-dlib-2007chrysos.tkl