Abstract |
As semiconductor technology advances, more area is available to inject memory
and logic onto a single chip. This area can be either partitioned into few complex
processor elements (PEs) or many simple PEs. The current trend is to inject as
many simple PEs as possible. In order to connect many PEs, Network-on-Chip
(NoC) solutions are becoming more favourable due to their modular characteristics.
However, NoC's utilization is affected by the increase in PE count, decreasing the
network's and system's energy efficiency.
Various proposals have appeared for tackling energy-utilization inefficiencies
at the network architecture level. These fall within three architectural parameters:
network partitioning (P), concentration (C) and express physical links (X). However,
these efforts assume a small design space among the P,C and X parameters,
or use unscalable schemes for express physical links, or unfairly distribute buffer
space and bisection bandwidth along the network architectures. As a consequence,
researchers end up with different conclusions in terms of how a network's energy
efficiency is affected by the P,C and X architectural parameters.
This work evaluates the area, performance and energy of the network architectures
that have been derived by applying each of the three architectural parameters
(P, C and X), either separately or combinatorially, to a (baseline) single 2D mesh
network. The P parameter considers homogeneous partitioning and two types of
heterogeneous partitioning, the C parameter explores one concentration degree
of 4 PEs per network node, and finally, the X parameter assumes two express
intervals (2-hop or 4-hop). This has resulted in a design space of 20 and 24 network
configurations for 64 and 256 PEs respectively. All of the network instances
were simulated using various traffic patterns that exhibit diverse communication
behaviors and a varying range of control packets per data packet ratios. To enforce
strong fairness, we kept each network's buffer space allocation and bisection
bandwidth almost equal to the baseline, by properly adjusting the respective
router micro-architecture without degrading performance. In some cases, the router
micro-architecture adjustments improved performance.
Drawing on insights from our analysis, we observe that in future NoCs of hundreds
of PEs, the exclusive use of express physical links utilizing express interval
equal to 2, without concentration or network partitioning, is the best approach in
terms of energy-area savings and energy-area efficiency. Furthermore, we demonstrate
that network partitioning under a fair buffer space and bisection bandwidth
allocation decreases the area efficiency rather than increases, regardless of partitioning
scheme. Energy consumed and energy efficiency declines too, except for a
particular type of heterogeneous scheme that gives slight improvements. However
this happens only in cases where a specific range of control packets per data packet
ratios are injected. Finally, through this work, one can determine the best suited
network architecture for each different use-case and PE count.
|