### The Mythical IP Block: An Investigation of Contemporary IP Characteristics

By Evriklis Kounalakis

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE AT UNIVERSITY OF CRETE HERAKLIO, GREECE OCTOBER 2005

ii

#### Abstract

This work attempts to de-mystify the characteristics and properties of the "mythical" IP block, *i.e.* an average-case typical IP, as it undergoes through the implementation phases of contemporary EDA flows. By making these characteristics apparent, EDA developers and researchers will be able to enhance current EDA practices and algorithms, whereas designers will be able to clearly understand what the tradeoffs and properties of the implementation process are. A total of 7 IPs, which are in the Open IP domain, have been implemented and analysed using two types of EDA flows, a Synthesis, P&R flow and a Physically Knowledgeable Synthesis, P&R flow. The sets of experiments and measurements have been performed for two technology libraries, a  $0.13\mu$ m and a  $0.25\mu$ m and for both the typical and worst-case technology library corners.

The experiments investigate the area-speed and power-speed tradeoff, *i.e* what effect the timing constraints have on the area occupation and power consumption of the IP block. For the pipelined design, the pipeline balance is examined. The critical paths are analyzed in terms of their topology on the final layout, of their margin from the most critical path and finally of whether the most critical paths remain critical while they undergo the steps of the flows. Moreover, the post-synthesis versus post-P&R gap is studied. Finally, the origin of the most switching activity is investigated.

The characteristics of the "mythical" IP are derived by the average characteristics of the IP blocks that are used as benchmarks.

Master Thesis Supervisors: Prof. Manolis Katevenis, Dr. Christos Sotiriou.

ii

#### Περίληψη

Η εργασία αυτή έχει σα στόχο να απομυθοποιήσει τα χαρακτηριστικά και τις ιδιότητες του 'μυθικού' IP, δηλαδή της τυπικής περίπτωσης IP, καθώς αυτό περνάει από τις φάσεις υλοποίησης σύγχρονων αυτόματων ροών υλοποίησης κυκλωμάτων. Φανερώνοντας αυτά τα χαρακτηριστικά, οι προγραμματιστές των αυτόματων ροών υλοποίησης και οι ερευνητές θα μπορέσουν να βελτιώσουν τις υπάρχουσες πρακτικές και τους υπάρχοντες αλγορίθμους των αυτόματων ροών υλοποίησης κυκλωμάτων, ενώ οι σχεδιαστές θα μπορέσουν να καταλάβουν τι δυνατότητες έχουν οι αυτόματες ροές σε σχέση με τα κυκλώματα IP. Συνολικά υλοποιήθηκαν και αναλύθηκαν 7 κυκλώματα IP ανοιχτού κώδικα, χρησιμοποιώντας δύο ροές αυτόματου σχεδιασμού. Η μία ροή περιλαμβάνει λογική σύνθεση και κατόπιν τοποθέτηση και διασύνδεση, ενώ η δεύτερη περιλαμβάνει λογική σύνθεση με γνώση της φυσικής τοποθέτησης στο πρώτο βήμα και διασύνδεση στο δεύτερο βήμα. Τα πειράματα και οι μετρήσεις έγιναν για δύο βιβλιοθήκες, μία 0.13μm και μία 0.25μm, τόσο για τις τυπικές όσο και για τις χείριστες συνθήκες λειτουργίας.

Τα πειράματα ερευνούν τη σχέση ταχύτητας-εμβαδού και τη σχέση κατανάλωσηςταχύτητητας, δηλαδή πως επηρεάζει η χαλάρωση των χρονικών περιορισμών το εμβαδό και την κατανάλωση ενέργειας. Επίσης ερευνάται η ισορροπία των σταδίων ομοχειρίας, οι κρίσιμες οδοι ως προς την τοποθεσία τους και η επιρροή των ροών στη δομή των πιο κρίσιμων οδών. Επίσης, μελετάται η διαφορά ανάμεσα στα αποτελέσματα μετά από σύνθεση και μετά από τοποθέτηση και διασύνδεση, καθώς και η κατανάλωση ενέργειας του κυκλώματος.

Τα χαρακτηριστικά του 'μυθικού' IP είναι ο μέσος όρος των χαρακτηριστικών των κυκλωμάτων που μελετώνται.

Επόπτες Μεταπτυχιακής Εργασίας: Καθηγητής Μανόλης Κατεβαίνης -Δρ. Χρήστος Σωτηρίου

#### Acknowledgements

I would like to thank ICS-FORTH, Greece for the financial support it provided me during the period I studied for this thesis. I would also like to thank my instructor, Dr. Christos Sotiriou, who gave me valuable directions and helped me with his continuous support. Besides, I would like to thank Prof. Manolis Katevenis for his valuable advice regarding my thesis. I also thank Prof. Apostolos Traganitis and Prof. Ioannis Tollis who agreed to be members of the evaluation comitte of this thesis.

I would also like to thank the members of the Asynchronous Group of the Computer Architecture and VLSI Group of ICS-FORTH for their valuable advice, N. Andrikos, P. Matthaiakis, V. Zebilis and O. Dokianaki. I would also like to thank V. Vlachos for providing me with one of the designs I used as testbenches.

I would like to offer my thanks to my friends George, Stelios, Ritsa, Maria and Eleni for their support while I was working on my thesis. Special thanks to Ioanna for her patience and support.

Finally, I would like to thank my family, my brother Dimitris, my father Manolis and my mother Aspasia, for their support.

vi

### Contents

| 1        | Intr | roducti                    | ion                                                            | 1  |  |  |
|----------|------|----------------------------|----------------------------------------------------------------|----|--|--|
| <b>2</b> | ED.  | EDA Design and Methodology |                                                                |    |  |  |
|          | 2.1  | Exper                      | iment Motivation                                               | 5  |  |  |
|          |      | 2.1.1                      | Area-Speed Analysis                                            | 5  |  |  |
|          |      | 2.1.2                      | Pipeline Balancing                                             | 5  |  |  |
|          |      | 2.1.3                      | Critical Path Analysis                                         | 6  |  |  |
|          |      | 2.1.4                      | Critical Path Topology                                         | 7  |  |  |
|          |      | 2.1.5                      | Synthesis versus Placement and Routing, $i.e.$ Physical Design | 7  |  |  |
|          |      | 2.1.6                      | Typical Corner versus Worst-Case Corner                        | 7  |  |  |
|          |      | 2.1.7                      | Power-Speed Analysis                                           | 8  |  |  |
|          |      | 2.1.8                      | Switching activity                                             | 8  |  |  |
|          | 2.2  | Design                     | n Flow                                                         | 8  |  |  |
|          | 2.3  | Exper                      | imental Framework                                              | 10 |  |  |
|          |      | 2.3.1                      | Maximum Period Determination                                   | 10 |  |  |
|          |      | 2.3.2                      | Critical Path Results                                          | 11 |  |  |
|          |      | 2.3.3                      | Physical Critical Path Topology                                | 11 |  |  |
|          |      | 2.3.4                      | Power Consumption Results                                      | 11 |  |  |
|          |      | 2.3.5                      | Total Cell Area Measurement                                    | 12 |  |  |
|          |      | 2.3.6                      | Pipeline Balancing Results                                     | 13 |  |  |
|          |      | 2.3.7                      | Switching Activity Results                                     | 13 |  |  |
| 3        | IP I | Block I                    | Benchmarks                                                     | 15 |  |  |
|          | 3.1  | Open                       | IP Core Suite                                                  | 15 |  |  |
|          | 3.2  | IP Co                      | res Description                                                | 16 |  |  |

|          |                | 3.2.1   | Aemb                                                     | 16 |
|----------|----------------|---------|----------------------------------------------------------|----|
|          |                | 3.2.2   | DES3                                                     | 17 |
|          |                | 3.2.3   | DLX                                                      | 19 |
|          |                | 3.2.4   | Huffman                                                  | 20 |
|          |                | 3.2.5   | Reed-Solomon                                             | 20 |
|          |                | 3.2.6   | RISC                                                     | 22 |
|          |                | 3.2.7   | VGA-LCD                                                  | 23 |
| 4        | $\mathbf{Exp}$ | erimer  | ntal Results                                             | 25 |
|          | 4.1            | Area-S  | peed Results                                             | 25 |
|          | 4.2            | Pipelir | ne Balance Results                                       | 38 |
|          |                | 4.2.1   | Post-Synthesis Results                                   | 39 |
|          |                | 4.2.2   | Post-Place and Route Results                             | 41 |
|          | 4.3            | Critica | l Path Results                                           | 45 |
|          |                | 4.3.1   | Synthesis Results                                        | 46 |
|          |                | 4.3.2   | Place and Route Results                                  | 48 |
|          | 4.4            | Power   | Analysis Results                                         | 53 |
|          | 4.5            | Maxim   | num Frequency Comparison                                 | 68 |
|          |                | 4.5.1   | SYN-P&R flow Post-synthesis versus SYN-P&R flow Post-P&R | 68 |
|          |                | 4.5.2   | SYN-P&R flow Post-P&R WC versus PKS-R flow               | 69 |
|          |                | 4.5.3   | Typical Corner Versus Worst Corner                       | 70 |
|          | 4.6            | Critica | l Path Analysis                                          | 71 |
|          |                | 4.6.1   | Synthesis versus Place and Route                         | 72 |
|          |                | 4.6.2   | SYN-P&R flow WC versus PKS-R flow                        | 73 |
|          | 4.7            | Critica | l Path Topology                                          | 74 |
|          |                | 4.7.1   | UMC-0.13 WC                                              | 74 |
|          |                | 4.7.2   | IHP-0.25 WC                                              | 79 |
|          |                | 4.7.3   | Typical Corner                                           | 84 |
|          | 4.8            | Switch  | ing Activity Analysis                                    | 88 |
| <b>5</b> | The            | "Myt    | hical" IP Block                                          | 91 |
|          | 5.1            | Post-S  | ynthesis vs. Post-P&R Analysis                           | 91 |
|          | 5.2            | Area v  | s. Speed Analysis                                        | 92 |
|          |                |         |                                                          |    |

|    | 5.3              | Power   | vs. Speed Analysis                    | . 93 |  |
|----|------------------|---------|---------------------------------------|------|--|
|    | 5.4              | Power   | Distribution Analysis                 | . 93 |  |
|    | 5.5              | Critica | l Path Timing Distribution Analysis   | . 95 |  |
|    | 5.6              | Critica | l Path Physical Distribution Analysis | . 95 |  |
|    | 5.7              | Design  | Balance Analysis                      | . 96 |  |
| 6  | Con              | clusior | n and Future Work                     | 97   |  |
|    | 6.1              | Future  | Work                                  | . 97 |  |
|    |                  | 6.1.1   | Expansion of the Benchmark Suite      | . 97 |  |
|    |                  | 6.1.2   | Optimizing the IP Blocks              | . 98 |  |
|    |                  | 6.1.3   | Optimizing the EDA Flows              | . 98 |  |
| Bi | Bibliography 103 |         |                                       |      |  |

х

# List of Figures

| 2.1        | An unbalanced pipeline                            | 6        |
|------------|---------------------------------------------------|----------|
| 2.2        | A pipeline stage with varying length paths        | 7        |
| 2.3        | SYN-P&R flow block diagram                        | 9        |
| 2.4        | PKS-R flow block diagram                          | 10       |
| 2.5        | Maximum period determination                      | 11       |
| 2.6        | Critical path topology determination.             | 12       |
| 2.7        | Power consumption estimation                      | 12       |
| 2.8        | Pipeline balance estimation                       | 13       |
| 2.9        | Switching activity estimation                     | 14       |
| 3.1        | Block diagram of Aemb                             | 16       |
| 3.2        | Pipeline diagram of Aemb                          | 17       |
| 3.3        | Encryption operation of the DES algorithm         | 18       |
| 3.4        | The triple DES encoding operation                 | 18       |
| 3.5        | The triple DES decoding operation                 | 19       |
| 3.6        | Block diagram of DLX                              | 19       |
| 3.7        | Huffman encoding in the JPEG encoding.            | 20       |
| 3.8        | Huffman decoding in the JPEG decoding.            | 21       |
| 3.9        | Reed-Solomon encoding-decoding                    | 21       |
| 3.10       | Reed-Solomon encoder                              | 22       |
| 3.11       | RISC block diagram.                               | 22       |
| 3.12       | VGA-LCD block diagram.                            | 23       |
| / 1        | Area-speed results for Aemb in LIMC 0.13          | 96       |
| т.1<br>Д Э | Area speed results for Aemb in IHP 0.25           | 20       |
| ±.∠<br>1 9 | Area speed results for DEC2 in $IIMC = 12$        | -1<br>10 |
| 4.3        | Area-speed results for DE55 III $\cup$ MO- $0.13$ | 4ð       |

| 4.4 Area-speed results for DES3 in IHP-0.25                                           | 8 |
|---------------------------------------------------------------------------------------|---|
| 4.5 Area-speed results for DLX in UMC-0.13                                            | 0 |
| 4.6 Area-speed results for DLX in IHP-0.25                                            | 1 |
| 4.7 Area-speed results for Huffman in UMC-0.13                                        | 2 |
| 4.8 Area-speed results for Huffman in IHP-0.25                                        | 3 |
| 4.9 Area-speed results for RISC in UMC-0.13                                           | 4 |
| 4.10 Area-speed results for RISC in IHP-0.25                                          | 4 |
| 4.11 Area-speed results for Reed-Solomon in UMC-0.13                                  | 6 |
| 4.12 Area-speed results for Reed-Solomon in IHP-0.25                                  | 6 |
| 4.13 Area-speed results for VGA-LCD in UMC-0.13                                       | 7 |
| 4.14 Area-speed results for VGA-LCD in IHP-0.25                                       | 8 |
| 4.15 Numbers of cells in most critical paths post-synthesis TYP IHP 4                 | 6 |
| 4.16 Numbers of cells in most critical paths post-synthesis TYP UMC $\ldots \ldots 4$ | 7 |
| 4.17 Numbers of cells in most critical paths post-synthesis WC IHP $\ldots \ldots 44$ | 8 |
| 4.18 Numbers of cells in most critical paths post-synthesis WC UMC $\ldots \ldots 48$ | 8 |
| 4.19 Numbers of cells in most critical paths post-P&R TYP IHP                         | 9 |
| 4.20 Numbers of cells in most critical paths post-P&R TYP UMC                         | 0 |
| 4.21 Numbers of cells in most critical paths post-P&R WC IHP                          | 1 |
| 4.22 Numbers of cells in most critical paths post-P&R WC UMC 52                       | 1 |
| 4.23 Numbers of cells in most critical paths PKS-R IHP                                | 2 |
| 4.24 Numbers of cells in most critical paths PKS-R UMC                                | 3 |
| 4.25 Power-speed results for Aemb in UMC-0.13                                         | 4 |
| 4.26 Power-speed results for Aemb in IHP-0.25                                         | 5 |
| 4.27 Power-speed results for DES in UMC-0.13                                          | 7 |
| 4.28 Power-speed results for DES in IHP-0.25                                          | 7 |
| 4.29 Power-speed results for DLX in UMC-0.13                                          | 8 |
| 4.30 Power-speed results for DLX in IHP-0.25                                          | 9 |
| 4.31 Power-speed results for Huffman in UMC-0.13                                      | 0 |
| 4.32 Power-speed results for Huffman in IHP-0.25                                      | 1 |
| 4.33 Power-speed results for RISC in UMC-0.13                                         | 2 |
| 4.34 Power-speed results for RISC in IHP-0.25                                         | 3 |
| 4.35 Power-speed results for Reed-Solomon in UMC-0.13                                 | 4 |

| 4.36 | Power-speed results for Reed-Solomon in IHP-0.25.        | 64 |
|------|----------------------------------------------------------|----|
| 4.37 | Power-speed results for VGA-LCD in UMC-0.13.             | 66 |
| 4.38 | Power-speed results for VGA-LCD in IHP-0.25.             | 67 |
| 4.39 | Topology of critical paths for Aemb SYN-P&R flow         | 75 |
| 4.40 | Topology of critical paths for Aemb PKS-R flow.          | 75 |
| 4.41 | Topology of critical paths for DES3 SYN-P&R flow         | 76 |
| 4.42 | Topology of critical paths for DES3 PKS-R flow.          | 76 |
| 4.43 | Topology of critical paths for DLX SYN-P&R flow          | 76 |
| 4.44 | Topology of critical paths for DLX PKS-R flow.           | 76 |
| 4.45 | Topology of critical paths for Huffman SYN-P&R flow      | 77 |
| 4.46 | Topology of critical paths for Huffman PKS-R flow        | 77 |
| 4.47 | Topology of critical paths for Reed-Solomon SYN-P&R flow | 78 |
| 4.48 | Topology of critical paths for Reed-Solomon PKS-R flow   | 78 |
| 4.49 | Topology of critical paths for RISC SYN-P&R flow.        | 78 |
| 4.50 | Topology of critical paths for RISC PKS-R flow           | 78 |
| 4.51 | Topology of critical paths for VGA-LCD SYN-P&R flow.     | 79 |
| 4.52 | Topology of critical paths for VGA-LCD PKS-R flow        | 79 |
| 4.53 | Topology of critical paths for Aemb SYN-P&R flow         | 80 |
| 4.54 | Topology of critical paths for Aemb PKS-R flow.          | 80 |
| 4.55 | Topology of critical paths for DES3 SYN-P&R flow         | 80 |
| 4.56 | Topology of critical paths for DES3 PKS-R flow.          | 80 |
| 4.57 | Topology of critical paths for DLX SYN-P&R flow.         | 81 |
| 4.58 | Topology of critical paths for DLX PKS-R flow.           | 81 |
| 4.59 | Topology of critical paths for Huffman SYN-P&R flow.     | 82 |
| 4.60 | Topology of critical paths for Huffman PKS-R flow.       | 82 |
| 4.61 | Topology of critical paths for Reed-Solomon SYN-P&R flow | 82 |
| 4.62 | Topology of critical paths for Reed-Solomon PKS-R flow   | 82 |
| 4.63 | Topology of critical paths for RISC SYN-P&R flow.        | 83 |
| 4.64 | Topology of critical paths for RISC PKS-R flow           | 83 |
| 4.65 | Topology of critical paths for VGA-LCD SYN-P&R flow.     | 83 |
| 4.66 | Topology of critical paths for VGA-LCD PKS-R flow        | 83 |
| 4.67 | Topology of critical paths for Aemb in UMC-0.13          | 84 |

| 4.68 | Topology of critical paths for Aemb in IHP-0.25                                      | 84 |
|------|--------------------------------------------------------------------------------------|----|
| 4.69 | Topology of critical paths for DES3 in UMC-0.13 $\ .$                                | 85 |
| 4.70 | Topology of critical paths for DES3 in IHP-0.25 $\ldots$                             | 85 |
| 4.71 | Topology of critical paths for DLX in UMC-0.13                                       | 85 |
| 4.72 | Topology of critical paths for DLX in IHP-0.25                                       | 85 |
| 4.73 | Topology of critical paths for Huffman in UMC-0.13                                   | 86 |
| 4.74 | Topology of critical paths for Huffman in IHP-0.25                                   | 86 |
| 4.75 | Topology of critical paths for Reed-Solomon in UMC-0.13                              | 86 |
| 4.76 | Topology of critical paths for Reed-Solomon in IHP-0.25                              | 86 |
| 4.77 | Topology of critical paths for RISC in UMC-0.13                                      | 87 |
| 4.78 | Topology of critical paths for RISC in IHP-0.25                                      | 87 |
| 4.79 | Topology of critical paths for VGA-LCD in UMC-0.13                                   | 87 |
| 4.80 | Topology of critical paths for VGA-LCD in IHP-0.25                                   | 87 |
| 5.1  | Average Area-Speed Results - $0.25\mu$ m Process                                     | 92 |
| 5.2  | Average Area-Speed Results - $0.13\mu$ m Process $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 93 |
| 5.3  | Average Power-Speed Results - $0.25\mu m$ Process $\ldots \ldots \ldots \ldots$      | 94 |
| 5.4  | Average Power-Speed Results - $0.13 \mu m$ Process $\dots \dots \dots \dots \dots$   | 94 |

## List of Tables

| 4.1  | Area-speed results for Aemb in UMC-0.13                                       | 26 |
|------|-------------------------------------------------------------------------------|----|
| 4.2  | Area-speed results for Aemb in IHP-0.25                                       | 27 |
| 4.3  | Area-speed results for DES3 in UMC-0.13                                       | 29 |
| 4.4  | Area-speed results for DES3 in IHP-0.25                                       | 29 |
| 4.5  | Area-speed results for DLX in UMC-0.13                                        | 30 |
| 4.6  | Area-speed results for DLX in IHP-0.25                                        | 31 |
| 4.7  | Area-speed results for Huffman in UMC-0.13                                    | 32 |
| 4.8  | Area-speed results for Huffman in IHP-0.25                                    | 33 |
| 4.9  | Area-speed results for RISC in UMC-0.13                                       | 35 |
| 4.10 | Area-speed results for RISC in IHP-0.25                                       | 35 |
| 4.11 | Area-speed results for Reed-Solomon in UMC-0.13                               | 35 |
| 4.12 | Area-speed results for Reed-Solomon in IHP-0.25                               | 37 |
| 4.13 | Area-speed results for VGA-LCD in UMC-0.13                                    | 38 |
| 4.14 | Area-speed results for VGA-LCD in IHP-0.25                                    | 39 |
| 4.15 | Pipeline balance results post-synthesis for the typical corner of IHP-0.25 $$ | 39 |
| 4.16 | Pipeline balance results post-synthesis for the typical corner of UMC-        |    |
|      | 0.13                                                                          | 40 |
| 4.17 | Pipeline balance results post-synthesis for the worst corner of IHP-0.25 $$   | 41 |
| 4.18 | Pipeline balance results post-synthesis for the worst corner of UMC-0.13 $$   | 41 |
| 4.19 | Pipeline balance results post-P&R for the typical corner of IHP-0.25 .        | 42 |
| 4.20 | Pipeline balance results post-P&R for the typical corner of UMC-0.13 $$       | 42 |
| 4.21 | Pipeline balance results post-P&R for the worst corner of IHP-0.25 $$ .       | 43 |
| 4.22 | Pipeline balance results post-P&R for the worst corner of UMC-0.13 .          | 44 |
| 4.23 | Pipeline balance results for PKS-R flow in IHP-0.25                           | 45 |
| 4.24 | Pipeline balance results for PKS-R in UMC-0.13                                | 45 |

| 4.25 | Numbers of cells in most critical paths post-synthesis TYP IHP $~$          | 46 |
|------|-----------------------------------------------------------------------------|----|
| 4.26 | Numbers of cells in most critical paths post-synthesis TYP UMC $~$          | 47 |
| 4.27 | Numbers of cells in most critical paths post-synthesis WC IHP               | 49 |
| 4.28 | Numbers of cells in most critical paths post-synthesis WC UMC $\ . \ . \ .$ | 49 |
| 4.29 | Numbers of cells in most critical paths post-P&R TYP IHP                    | 50 |
| 4.30 | Numbers of cells in most critical paths post-P&R TYP UMC                    | 50 |
| 4.31 | Numbers of cells in most critical paths post-P&R WC IHP $\ldots \ldots$     | 52 |
| 4.32 | Numbers of cells in most critical paths post-P&R WC UMC $\ldots$ .          | 52 |
| 4.33 | Numbers of cells in most critical paths PKS-R IHP                           | 53 |
| 4.34 | Numbers of cells in most critical paths PKS-R UMC                           | 53 |
| 4.35 | Power-speed results for Aemb UMC-0.13                                       | 55 |
| 4.36 | Power-speed results for Aemb IHP-0.25                                       | 56 |
| 4.37 | Power-speed results for DES3 UMC-0.13                                       | 56 |
| 4.38 | Power-speed results for DES3 IHP-0.25                                       | 58 |
| 4.39 | Power-speed results for DLX UMC-0.13                                        | 59 |
| 4.40 | Power-speed results for DLX IHP-0.25                                        | 59 |
| 4.41 | Power-speed results for Huffman UMC-0.13                                    | 60 |
| 4.42 | Power-speed results for Huffman IHP-0.25                                    | 61 |
| 4.43 | Power-speed results for RISC UMC-0.13                                       | 63 |
| 4.44 | Power-speed results for RISC IHP-0.25                                       | 63 |
| 4.45 | Power-speed results for Reed-Solomon UMC-0.13                               | 65 |
| 4.46 | Power-speed results for Reed-Solomon IHP-0.25                               | 65 |
| 4.47 | Power-speed results for VGA-LCD UMC-0.13                                    | 67 |
| 4.48 | Power-speed results for VGA-LCD IHP-0.25                                    | 67 |
| 4.49 | Post-synthesis and post-P&R max frequency for IHP-0.25                      | 70 |
| 4.50 | Post-synthesis and post-P&R max frequency for UMC-0.13                      | 70 |
| 4.51 | Startpoints and Endpoints for UMC-0.13 typical corner                       | 72 |
| 4.52 | Startpoints and Endpoints for IHP-0.25 typical corner                       | 72 |
| 4.53 | Startpoints and Endpoints for UMC-0.13 worst corner                         | 73 |
| 4.54 | Startpoints and Endpoints for IHP-0.25 worst corner                         | 73 |
| 4.55 | Startpoints and Endpoints for IHP-0.25 for the two flows                    | 74 |
| 4.56 | Startpoints and Endpoints for UMC-0.13 for the two flows                    | 74 |

| 4.57 | Switching activity for AeMB                 | 88 |
|------|---------------------------------------------|----|
| 4.58 | Switching activity for DES3                 | 88 |
| 4.59 | Switching activity for DLX                  | 88 |
| 4.60 | Switching activity for Huffman              | 89 |
| 4.61 | Switching activity for RISC                 | 89 |
| 4.62 | Switching activity for Reed-Solomon         | 89 |
| 4.63 | Switching activity for VGA-LCD              | 89 |
|      |                                             |    |
| 5.1  | Average Maximum Speed                       | 91 |
| 5.2  | Average Clock Network Power Consumption     | 95 |
| 5.3  | Average percentage of cells in path margins | 95 |

### Chapter 1

### Introduction

Electronic Design Automation (EDA) is the most common approach to contemporary digital electronic design, as it enables designers to think at a higher level of abstraction, dividing a problem into its constituent parts, *i.e.* an SOC into several IP blocks, and either implement each IP from an RTL specification, or re-use an existing design. EDA flows transforming RTL Code to Mask Layout are not as efficient as custom design [CK02], however they reduce time to market and greatly speedup the implementation process from years to months [Mar98, BL00, BC02].

However, as transistors continue to shrink, EDA is facing the significant problem of process variability [Nas01]. In the traditional process corner model, libraries are characterized in best, typical and worst-case conditions. Due to device variations, the process corner model is indeed reaching its limits, because the timing gap between typical and worst-case conditions is becoming larger and larger. In addition, timing in EDA flows suffers from an accumulation of worst-case approximations and assumptions, including, worst-case conditions being regarded as worst-case voltage and worst-case temperature, false-paths in STA [DYG89], worstcase rounding of gate-delays in library lookup tables, conservative estimations in SI analysis [cel04, Phy03] and extra delay margins for processing variations for latest technologies [Nas01].

In fact, as speed-binning is not used in EDA, design will typically take place in the worst-case process corner. The best corner is used to detect hold violations, where the minimum arrival time is required, whereas the typical corner is often not used at all in the design process. Several research possibilities exist towards optimizing current EDA flows and practices for DSM processes and taking variability and the gap between typical and worst-case conditions into account [BCK<sup>+</sup>04, SBY03, DWD91, GK98, GK00].

The motivation of this work is a clear understanding of IP block implementation characteristics in contemporary EDA flows. In order to improve upon existing practices it is necessary to have a very clear understanding of the benefits and drawbacks of existing practices, *e.g.* in terms of timing and area it is desirable to know how much the typical corner lies away from the worst-case corner, to see how much of a loss we are incurring by worst-case design, in terms of power it is interesting to know how much switching activity contemporary IP circuits present, to see whether alternative implementation approaches, such as approaches based on dual-rail or larger encodings can favourably compare in terms of power, against contemporary single-rail design.

The focus of this study is on the implementation properties, including area-speed comparison, power-speed comparison, pipeline balancing, critical path analysis and physical topology, post-synthesis versus post-P&R comparison and switching activity analysis. This study is based on 7 Open IP blocks, which have been implemented using two types of EDA flows, a Synthesis, P&R flow (SYN-P&R) and a Physically Knowledgeable Synthesis, P&R flow (PKS-R). A set of characterization experiments and measurements have been performed for two technology libraries, a  $0.13\mu$ m provided by UMC [umc] and a  $0.25\mu$ m provided by IHP [ihp] and for both the typical and worst-case technology library corners.

The experimental data of these 7 designs are used to create a non-existent "typical" IP block, which is referred to as the "mythical" IP. By making the implementation characteristics of the "mythical" IP apparent, EDA developers and researchers will be able to enhance current EDA practices and perhaps propose new additional algorithms, whereas designers will be able to clearly understand what the tradeoffs and properties of the implementation process are.

The outline of this document is as follows. In chapter 2, the EDA design and methodology are presented. Chapter 3 offers a description of the IP blocks that are used as benchmarks. The results of the experiments are presented in chapter 4 and the characteristics of the "mythical" IP block are presented in chapter 5. The conclusions and the future work are presented in chapter 6.

### Chapter 2

### EDA Design and Methodology

In this chapter, the EDA design methodology used is presented, along with a detailed description of the experimental framework of this work.

#### 2.1 Experiment Motivation

#### 2.1.1 Area-Speed Analysis

This analysis examines how much area can be saved if the timing constraints are relaxed, how much area and speed can be saved when the design is implemented in typical-case corner instead of the worst-case corner and the amount of aggressiveness of the tools in terms of area optimization.

#### 2.1.2 Pipeline Balancing

This analysis investigates the asymmetry of the pipeline stages for the pipelined designs. Figure 2.1 illustrates an unbalanced pipeline, where two of the stages have approximately the same delay and a third stage (in the middle) has significantly larger delay than the two others. The middle stage will dominate the delay of the circuit and thus, the clock frequency of the pipeline must be as slow as the combinational logic in the middle of the pipeline. In this case, the whole circuit could run at a much faster clock frequency if the dominating stage could be optimized so that its delay would be minimized. In addition, a circuit of this type can benefit through the application of register retiming [HE96].



Figure 2.1: An unbalanced pipeline

#### 2.1.3 Critical Path Analysis

This analysis studies the range of paths that are within a certain delay margin of the most critical path. The delay of one combinational logic cloud is specified as the delay of the longest path from its inputs to its outputs. It is possible that one of the following scenaria is true.

- All of the paths of the combinational logic cloud have approximately the same delay.
- A relatively small amount of paths has delay which is almost similar to the delay of the longest path. In this case, the rest of the paths have a much smaller delay than the delay of the combinational cloud. This notion is illustrated in Figure 2.2. In this case, if the longest paths can be optimized, the whole combinational cloud can operate at a faster clock frequency.

An additional goal of this analysis is to determine the influence of the tools, the flow steps and the libraries in the criticality of the paths, *i.e.* how the critical paths are affected by the tools, or whether they always remain the same no matter what the libraries or the operating conditions are.



Figure 2.2: A pipeline stage with varying length paths.

#### 2.1.4 Critical Path Topology

This experiment studies the topology of the paths on the circuits' layout. The key question for this experiment is whether the critical paths are clustered in relatively small regions on the floorplan or if they show no important topological behaviour. This analysis can show whether telescopic units can be used for the optimization of the execution time [BMP97].

# 2.1.5 Synthesis versus Placement and Routing, *i.e.* Physical Design

This experiment tries to determine whether the place and route tools affect the nature of the circuit and if so, how important this effect is. Area, maximum frequency and power consumption are three of the most important comparisons made. Based on this analysis, the overhead of the placement and routing can be estimated.

#### 2.1.6 Typical Corner versus Worst-Case Corner.

This experiment studies the area, maximum frequency, power consumption, pipeline balancing, switching activity and the criticality of critical paths while the design is implemented using typical corner libraries and worst-case corner libraries. The results can show what losses can be expected if the designs are implemented using worst-case corner libraries instead of typical corner libraries.

#### 2.1.7 Power-Speed Analysis

This analysis examines how much power can be saved if the timing constraints are relaxed and what the difference in the power consumption when the design is implemented in the typical corner instead of the worst-case corner is.

#### 2.1.8 Switching activity

This experiment identifies the parts of an IP block which have high switching activity, thus consuming the largest portion of its power. This analysis can give directions regarding which parts of the IP block should be the target for power optimization.

#### 2.2 Design Flow

In this section, the design flow is presented in detail.

Two different design flows have been used. Both of them implement the benchmarks in two different libraries, in their typical and worst-case corners. The first library is the IHP  $0.25\mu m$  CMOS technology and the second is the UMC  $0.13\mu m$ CMOS technology. The typical corner of the IHP-0.25 technology means a voltage of 2.5V and a temperature of 25 degrees Celsius, while the worst-case corner means a voltage of 2.25V and a temperature of 125 degrees Celsius. The typical corner voltage for UMC-0.13 is 1.2V and the typical corner temperature is 25 degrees Celsius, whereas the worst-case corner values are 1.08V and 125 degrees Celsius respectively.

The first design flow, which will be referred to as SYN-P&R flow, is comprised of two main phases, as shown in Figure 2.3. During the first phase, the design undergoes synthesis and technology mapping. Timing constraints are given to the synthesis tool so that the circuit is optimized for maximum speed. After the maximum operating speed has been determined, the synthesized netlist is passed over to the second phase of the flow. During the second phase of the flow, the synthe-



Figure 2.3: SYN-P&R flow block diagram.

sized circuit is placed and routed with in-place optimization (IPO). After placement and routing is completed, a netlist file and a standard delay format (SDF) file are extracted to be used for simulation.

The second design flow, which will be referred to as PKS-R flow, is comprised of two distinct phases which differ from SYN-P&R flow. During the first phase, the design is synthesized without a wireload model, but with physically knowledgeable synthesis (PKS) with timing constraints. After synthesis is complete, the circuit is placed in an area which targets a utilization of 70%. The tool is directed to perform such a placement, that the speed of the circuit is optimized. After the circuit has been placed, a netlist and a placement information file (PIF) is passed over to the second phase of the flow. During the second phase of the flow, the placed circuit is routed by the same routing tool that has been used in SYN-P&R flow. The main difference in this step is that in PKS-R flow, the place and route tool does not perform any placement but it only applies the steps that follow placement. After routing is completed, a netlist and an SDF file are extracted to be used for simulation. A block diagram of PKS-R flow is shown in Figure 2.4.



Figure 2.4: PKS-R flow block diagram.

#### 2.3 Experimental Framework

In this section, the methodology that has been followed to obtain the results is presented.

#### 2.3.1 Maximum Period Determination

In SYN-P&R flow, the maximum period is determined by synthesizing the circuit iteratively with tighter timing constraints. The circuit is synthesized until a maximum period is obtained for the given library. This process is shown in Figure 2.5. The value of the maximum clock period that is determined post-synthesis is the reference clock period constraint inserted to the place and route tool. Typically, the clock period that the place and route tool can reach is worse than the clock period reported by the synthesis tool. This is caused by the fact that the place and route tool adds buffering elements for the clock tree and has a more realistic view of the wiring delay. In order to determine the maximum clock frequency post-P&R, the same iterative procedure that is described in the synthesis procedure is repeated.

In PKS-R flow, the same procedure as in SYN-P&R flow is followed. The first determination of maximum clock frequency is performed after the circuit is placed in the first step of the flow. Like in the second step of the SYN-P&R flow, the procedure is repeated after the circuit has been routed.



Figure 2.5: Maximum period determination.

For both flows, the circuit is synthesized and place and routed for four different values of the maximum clock period. *i.e.* the 100%, the 75%, the 50% and the 25% timing points are studied.

#### 2.3.2 Critical Path Results

This experiment measures the number of cells which are in the paths with delay within 95%, 90%, 85%, 80% and 70% of the delay of the most critical path. The delay of the paths is determined using static timing analysis (STA).

#### 2.3.3 Physical Critical Path Topology

The target of this experiment is to show the topology of the 30% most critical paths on the layout. In order to determine how many paths belong to the 30% most critical paths, STA is used for the placed and routed design. After STA is completed for all the paths, the place and route tool highlights the 30% most critical paths. This method is illustrated in Figure 2.6.

#### 2.3.4 Power Consumption Results

Figure 2.7 shows the process used for obtaining the power consumption results. Using the testbench, which tests the typical operation of the IP block, the switching



Figure 2.6: Critical path topology determination.



Figure 2.7: Power consumption estimation.

activity is extracted for the typical operation. Next, the switching activity file (SAIF) can be used in order to create a power consumption report.

In SYN-P&R flow, there are two power consumption measurements, one postsynthesis and another post-P&R. The results are extracted using the same methodology in both steps.

In PKS-R flow the procedure for power measurement is the same as in SYN-P&R flow, but only one measurement is performed after the circuit has been routed. The tool that is used as the estimator is the place and route tool.

In order to derive the power consumed by the most critical paths, two files are needed. The first file, the SAIF, contains information about the power consumed by each signal of the circuit. The second file is a file which contains the names of the cells that are in the most critical paths whose power consumption is under question.

#### 2.3.5 Total Cell Area Measurement

In SYN-P&R flow, there are two cell area measurements, one post-synthesis and one post-P&R. In PKS-R flow there is only one measurement after the circuit has been routed.

#### 2.3.6 Pipeline Balancing Results

Figure 2.8 shows the pipeline balance estimation procedure. STA analysis deter-



Figure 2.8: Pipeline balance estimation.

mines the delay of each pipeline stage which, depends on the longest path of each pipeline stage.

In SYN-P&R flow, there are two pipeline balancing measurements, one postsynthesis and the second post-P&R. In PKS-R flow the pipeline balancing of the circuit is measured only after the circuit is routed.

#### 2.3.7 Switching Activity Results

Figure 2.9 shows the process for evaluating the switching activity of the cells of the design. The tool that is used for the estimation of the switching activity of the cells is the place and route tool in both flows. The tool accepts as input the placed and routed netlist file, which is described in verilog. Additionally, the corresponding SDF file is given to the place and route tool. The tool produces a switching activity report for the design. This report is then forwarded to a script which calculates



Figure 2.9: Switching activity estimation

the switching power consumption of the logic blocks of the design. Moreover, the switching power consumption of the clock tree is calculated by the same script.

In the next chapter, the IP benchmarks are described in detail. The description is focused on their architecture and their special characteristics.

### Chapter 3

### **IP Block Benchmarks**

In this chapter, the open IP core suite is described. After a small summary of the benchmarks, their architecture and special characteristics are presented.

#### 3.1 Open IP Core Suite

All of the IP cores that have been used in this project have been downloaded from http://www.opencores.org [ope]. They have been chosen so that they form as much as possible a representative sample of the contemporary IP cores. Typical components of an SoC include a CPU, a crypto core, various image or video processing cores, a bus interconnect and interface circuits with external peripherals [BC02, BL00]. The circuits that have been chosen are the following.

- Aemb: Aemb [aem] is a four stage pipeline CPU based on the architecture of the Microblaze microprocessor originally developed by Xilinx, Inc. [xil]
- DES3: DES3 [des] is a triple DES crypto core. The core has three stages of DES encryption-decryption modules. Each of these three stages is implemented as a 16-stage pipeline. Thus the complete core is implemented as a 48-stage pipeline circuit.
- DLX: DLX [dlx] is a five stage pipeline full-DLX CPU based on the architecture suggested by Hennessey and Paterson. [HP90]

- RISC: RISC is a 2-way superscalar five-stage pipeline CPU, which supports forwarding and is based on the RISC architecture.
- Huffman: Huffman [huf] is an entropy encoder-decoder designed to be used for video applications. This design is not pipelined.
- Reed\_Solomon: A reed solomon encoder [rs], implementing the reed solomon encoding algorithm. This design is also not pipelined.
- VGA-LCD: A VGA / LCD controller [vga] implementing a VGA and LCD controller. This design includes a Wishbone [wis] slave and master interface. This design is also not pipelined.

#### 3.2 IP Cores Description

#### 3.2.1 Aemb

Aemb is a Microblaze processor implementation designed by Xilinx, Inc, suitable for FPGAs. It has a four stage pipeline. A basic block diagram is shown in Figure 3.1. Aemb instruction set includes:



Figure 3.1: Block diagram of Aemb
- Arithmetic operations such as ADD, SUB, MUL, DIV, CMP.
- Logical operations such as AND, OR.
- Barrel shift operations such as BSRA, BSRL.
- Branch instructions such as BEQ.
- Specific instructions for the architecture such as PUT, GET.

Figure 3.2 is an illustration of the pipeline of Aemb showing the most important logic blocks.



Figure 3.2: Pipeline diagram of Aemb

The size of Aemb is about 11500 2-input NAND gates with drive strength of 4.

## 3.2.2 DES3

DES3 is a triple DES encryption/decryption core. It consists of 16 stage pipeline for each of the three stages of the encryption/decryption. The core performs one encoding/decoding every cycle. Figure 3.3 shows the encryption operation of the DES algorithm. The encryption procedure goes as follows. The algorithm accepts



Figure 3.3: Encryption operation of the DES algorithm

as input a 64-bit word and a 64-bit key. In the first step, the 64-bit word is permuted using a predefined permutation table. The permuted word is divided into two words of 32 bits each. These words are given the names "Up0" and "Down0" in Figure 3.3. The 32-bit words exchange positions after applying an encoding function to the "Down0" block. The procedure is iterated 16 times, then a final permutation takes place and the final encrypted block is obtained. The decoding procedure is the inverse of the the encoding procedure. In the case of triple DES, the DES algorithm is applied to the input 64-bit block three times. Every time the algorithm is applied, a *key* is used. In triple DES encryption, the first step is a DES encryption with key  $K_1$ . The second step is a DES decryption with key  $K_2$ . The third step is a DES encryption with key  $K_3$ . This operation is illustrated in Figure 3.4. In the case of



Figure 3.4: The triple DES encoding operation

triple DES decryption, the first step is a DES decryption with key  $K_3$ . The second step is a DES encryption with key  $K_2$ . The third step is a DES decryption with key  $K_1$ . This operation is illustrated in Figure 3.5. The three keys used in triple DES, can be independent, or they can be correlated.

The size of DES3 is about 75300 2-input NAND gate with drive strength of 4.



Figure 3.5: The triple DES decoding operation

## 3.2.3 DLX

DLX is an implementation of the RISC DLX processor suggested by Hennessey and Paterson. Figure 3.6 shows a block diagram of the DLX processor. DLX has an



Figure 3.6: Block diagram of DLX

instruction memory and a data memory interface, which communicate using 32-bit words. The instruction set of DLX includes:

- Arithmetic operations such as ADD, SUB, ADDI.
- Logical operations such as AND, OR, XOR.
- Branch and Jump operations such as BEQZ, J, JAL.
- Data movement operations such as MOV, SB, LH.
- Special operations such as TRAP, INT.

The size of DLX is about 12900 2-input NAND gates with a drive strength of 4.

## 3.2.4 Huffman

Huffman is a simple entropy encoder/decoder. It is not pipelined and performs one encoding/decoding operation per clock cycle. The size of Huffman is about 2700 2-input NAND gates with a drive strength of 4. Figure 3.7 shows the steps of JPEG encoding in which the Huffman encoder is involved. Huffman is involved in the last step of the JPEG encoding algorithm by producing the coded words using predefined tables.



Figure 3.7: Huffman encoding in the JPEG encoding.

Figure 3.8 shows the steps of JPEG decoding in which the Huffman decoder is involved. Huffman decodes the JPEG words in the first step of the JPEG decoding algorithm using predefined tables.

## 3.2.5 Reed-Solomon

Reed-Solomon codes are block-based error-correcting codes. They can be applied to a wide range of digital transmission applications. Reed-Solomon encoding operates on a block of data, adding redundant error-correcting bits to the data block. The "Reed-Solomon" benchmark is a design which implements a Reed-Solomon encoder. As shown in Figure 3.9, the transmitted block, in the presence of noise, may contain errors arriving at the receiver. The receiver applies the Reed-Solomon decoding



Figure 3.8: Huffman decoding in the JPEG decoding.



Figure 3.9: Reed-Solomon encoding-decoding.

algorithm to the received block and recovers the initial block using the extra errorcorrecting bits. The amount of errors that can be detected or corrected depends on the type of Reed-Solomon code that is being used. The design studied can implement a (255, 239) code, which means that it can can correct 8 errors and is applied on 8-bit blocks.

Figure 3.10 shows the block diagram of the Reed-Solomon encoder. The (255,239) Reed-Solomon encoder uses 16 polynomials. Every "MUL ADD" block carries out a multiplication and an addition on the polynomial and on the 8-bit symbol. Every register stores an 8-bit symbol. After the encoding has been completed, the circuit will have generated 16 parity symbols with size of 8 bits each.

The size of Reed-Solomon is about 4000 2-input NAND gates with a drive strength of 4.



Figure 3.10: Reed-Solomon encoder.

## 3.2.6 RISC

RISC is a pipelined 2-way superscalar RISC processor and supports forwarding. Figure 3.11 shows the basic block diagram of this design. The operations that can



Figure 3.11: RISC block diagram.

be executed include:

- Arithmetic operations such as ADD, SUB, ADDI.
- Logical operations such as AND, OR.
- Branch and Jump operations such as BEQZ, J, JAL.
- Data movement operations such as STB, LB.
- Special operations such as CALL.

RISC supports forwarding which enables the data to bypass Stage3 if the instruction does not need the functional units of this stage. RISC fetches two 32-bit instructions from the instruction memory and dispatches them both to the functional units if there are no dependencies involved. The constraints that are checked are read-afterwrite, write-after-write-and write-after-read. If both instructions can be executed in parallel, then they can use both of the ALU's which are located in Stage3.

The size of RISC is about 21600 2-input NAND gates with a drive strength of 4.

## 3.2.7 VGA-LCD

Figure 3.12 shows a basic block diagram of VGA-LCD. VGA-LCD supports SVGA resolutions up to 1024x768 and has three color modes: 32bpp, 16bpp and 8bpp. It also supports up to two hardware cursors of a maximum 64x64 pixel resolution. For 3D cursors, VGA-LCD supports Alpha blending and there is also support for triple displays. The communication interface of VGA-LCD is a 32-bit Wishbone interface [wis].



Figure 3.12: VGA-LCD block diagram.

The size of VGA-LCD is about 67300 2-input NAND gates with a drive strength of 4.

# Chapter 4

# **Experimental Results**

This chapter presents the results of the IP experiments and their analysis. These results will be used in order to derive the "mythical" IP block, which will be presented in chapter 5.

## 4.1 Area-Speed Results

This section presents the area-speed tradeoff results. The experiment has been run four times for four speed percentages for SYN-P&R TYP flow and SYN-P&R WC flow and one time for PKS-R flow.

Figure 4.1 shows the area-speed tradeoff for Aemb in UMC-0.13 and Figure 4.2 shows the area-speed tradeoff for Aemb in IHP-0.25. Table 4.1 shows the data that that has been plotted in Figure 4.1 and Table 4.2 shows the data that has been plotted in Figure 4.2 for Aemb.

In the case of UMC-0.13, the area occupied by the Aemb implementation at maximum frequency is about the same for both typical corner and worst-case corner operating conditions of SYN-P&R flow. However, the maximum frequency in the worst-case corner about 40% less than the maximum speed in the typical corner. When the design is implemented with the target clock frequency set at 25% of the maximum clock frequency, the area requirement is bigger than when the design is implemented with the target clock frequency set at 50% of the maximum clock frequency, which is unexpected.



Figure 4.1: Area-speed results for Aemb in UMC-0.13.

| Design          | Frequency(ns) | ${\rm Area}(\mu m^2)$ | % of max area |
|-----------------|---------------|-----------------------|---------------|
| $Aemb(TYP_100)$ | 2.8           | 178341                | 96.3058%      |
| $Aemb(TYP_75)$  | 3.7           | 167387                | 90.3905%      |
| $Aemb(TYP_50)$  | 5.6           | 162342                | 87.6662%      |
| $Aemb(TYP_25)$  | 9.2           | 165055                | 89.1312%      |
| $Aemb(WC_100)$  | 4.6           | 178613                | 96.4527%      |
| $Aemb(WC_75)$   | 6.1           | 170845                | 92.2579%      |
| $Aemb(WC_{50})$ | 11.2          | 161184                | 87.0409%      |
| $Aemb(WC_25)$   | 18.4          | 164173                | 88.6549%      |
| Aemb(PKS-R)     | 3.8           | 185182                | 100%          |

Table 4.1: Area-speed results for Aemb in UMC-0.13

Unlike UMC-0.13, in IHP-0.25, the design implemented in the worst-case corner is slower but smaller than the design implemented in the typical corner.

For both technologies, PKS-R produces a faster but larger design than the worstcase corner of SYN-P&R. In IHP-0.25, the area penalty is about 30%.

Figure 4.3 shows the area-speed tradeoff for DES3 in UMC-0.13 and Figure 4.4 shows the area-speed tradeoff for DES3 in IHP-0.25 and the actual values of the the



 $\operatorname{Area}(\mu m^2)$ % of max area Design Frequency(ns) Aemb(TYP\_100) 7.21924406 79.0739%Aemb(TYP\_75) 1840921 75.6435%9.5Aemb(TYP\_50) 14.4174815471.8317%Aemb(TYP\_25) 28.8167783268.9421% $Aemb(WC_100)$ 13185603476.2645%Aemb(WC\_75) 17.3175751872.2164%Aemb(WC\_50) 261612589 66.2613%Aemb(WC\_25) 1553276 63.8241%52Aemb(PKS-R) 11.72433681 100%

Figure 4.2: Area-speed results for Aemb in IHP-0.25.

Table 4.2: Area-speed results for Aemb in IHP-0.25

data are shown in Tables 4.3 and 4.4

Unlike Aemb, for DES3, the worst-case corner implementation is slower and larger than the typical corner one. The difference between the two implementations is larger for IHP-0.25, where the area overhead in the worst-case corner is about 15% and the speed overhead is about 78% compared to the typical corner. PKS-R flow yields better results than the worst-case corner of SYN-P&R flow both in terms of area and speed.



Figure 4.3: Area-speed results for DES3 in UMC-0.13.



Figure 4.4: Area-speed results for DES3 in IHP-0.25.

| Design          | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-----------------|---------------|--------------------------------|---------------|
| DES3(TYP_100)   | 1.7           | 767330                         | 94.8921%      |
| $DES3(TYP_75)$  | 2.3           | 709503                         | 87.7409%      |
| $DES3(TYP_50)$  | 3.4           | 700118                         | 86.5803%      |
| $DES3(TYP_25)$  | 6.8           | 700313                         | 86.6044%      |
| $DES3(WC_100)$  | 2.3           | 808634                         | 100%          |
| $DES3(WC_75)$   | 3.1           | 735086                         | 90.9047%      |
| $DES3(WC_50)$   | 4.6           | 708927                         | 87.6697%      |
| $DES3(WC_{25})$ | 9.2           | 708080                         | 87.565%       |
| DES3(PKS-R)     | 2.2           | 780380                         | 96.506%       |

Table 4.3: Area-speed results for DES3 in UMC-0.13

| Design          | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-----------------|---------------|--------------------------------|---------------|
| $DES3(TYP_100)$ | 4             | 11002773                       | 86.6759%      |
| $DES3(TYP_75)$  | 5.3           | 10222245                       | 80.5272%      |
| $DES3(TYP_50)$  | 8             | 8635636                        | 68.0284%      |
| $DES3(TYP_25)$  | 16            | 7807739                        | 61.5066%      |
| $DES3(WC_100)$  | 7.1           | 12694155                       | 100%          |
| $DES3(WC_75)$   | 9.5           | 10513730                       | 82.8234%      |
| $DES3(WC_{50})$ | 14.2          | 8910891                        | 70.1968%      |
| $DES3(WC_{25})$ | 28.4          | 7999245                        | 63.0152%      |
| DES3(PKS-R)     | 7.2           | 11344288                       | 89.3662%      |

Table 4.4: Area-speed results for DES3 in IHP-0.25

Figure 4.5 shows the area-speed tradeoff for DLX in UMC-0.13 and Figure 4.6 shows the area-speed tradeoff for DLX in IHP-0.25. Table 4.5 shows the data that that has been plotted in Figure 4.5 and Table 4.6 shows the data that has been plotted in Figure 4.6 for DLX. DLX's results are different for the two technologies. Although in UMC-0.13 the worst-case corner and the typical corner implementations occupy about the same area and have a 30% difference in speed, in IHP-0.25 the worst-case corner implementation is 6% smaller but 55% slower. For the PKS-R flow, although in IHP-0.25 it yields a 15% larger design than the worst-case corner of SYN-P&R flow, in UMC-0.13 the area required by the PKS-R flow is about the same as for both operating conditions of the SYN-P&R flow.



Figure 4.5: Area-speed results for DLX in UMC-0.13.

| Design           | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|------------------|---------------|--------------------------------|---------------|
| DLX(TYP_100)     | 2.5           | 166599                         | 99.705%       |
| $DLX(TYP_75)$    | 3.3           | 159826                         | 95.6515%      |
| $DLX(TYP_50)$    | 5             | 150949                         | 90.3389%      |
| DLX(TYP_25)      | 10            | 145532                         | 87.0969%      |
| $DLX(WC_{-100})$ | 3.6           | 166831                         | 99.8438%      |
| $DLX(WC_{75})$   | 4.8           | 154431                         | 92.4227%      |
| $DLX(WC_{50})$   | 7.2           | 142382                         | 85.2117%      |
| $DLX(WC_{25})$   | 14.4          | 135827                         | 81.2888%      |
| DLX(PKS-R)       | 3.2           | 167092                         | 100%          |

Table 4.5: Area-speed results for DLX in UMC-0.13

Figure 4.7 shows the area-speed tradeoff for Huffman in UMC-0.13 and Figure 4.8 shows the area-speed tradeoff for Huffman in IHP-0.25. Tables 4.7 and 4.8 show the data that that has been plotted in Figures 4.7 and 4.8. The worst-case corner implementation of Huffman introduces a 20% area overhead, comparing to the typical corner implementation in UMC-0.13, which is not present in IHP-0.25. A significant result is that for this design, for IHP-0.25, the PKS-R point seems to be on the worst-case corner curve, if this curve is expanded to the frequency of the PKS-R implementation. This is not the case in UMC-0.13, where the PKS-R



Area (um^2)

Figure 4.6: Area-speed results for DLX in IHP-0.25.

| Design          | Frequency(ns) | ${\rm Area}(\mu m^2)$ | % of max area |
|-----------------|---------------|-----------------------|---------------|
| DLX(TYP_100)    | 5.3           | 2249000               | 89.841%       |
| $DLX(TYP_75)$   | 7.1           | 2044351               | 81.6658%      |
| DLX(TYP_50)     | 10.6          | 1728979               | 69.0677%      |
| $DLX(TYP_25)$   | 21.2          | 1637857               | 65.4276%      |
| $DLX(WC_{100})$ | 11.8          | 2113861               | 84.4426%      |
| $DLX(WC_75)$    | 15.7          | 2012643               | 80.3992%      |
| $DLX(WC_50)$    | 23.6          | 1739582               | 69.4912%      |
| $DLX(WC_{25})$  | 47.2          | 1660526               | 66.3332%      |
| DLX(PKS-R)      | 9.2           | 2503312               | 100%          |

Table 4.6: Area-speed results for DLX in IHP-0.25

implementation is faster and smaller than the worst-case corner of SYN-P&R flow.

Figure 4.9 shows the area-speed tradeoff for RISC in UMC-0.13 and Figure 4.10 shows the area-speed tradeoff for RISC in IHP-0.25 and the corresponding data is shown in Tables 4.9 and 4.10. For RISC, in UMC-0.13, the typical corner curve indicates that there is not so much area gain when relaxing the timing constraint from 75% to 50%. This is an exception, as for almost all of the experiments, the smallest area gain is when the timing constraints are relaxed from 50% to 25%. For the UMC-0.13 implementation, the PKS-R flow manages to produce a smaller

design than even the typical corner of SYN-P&R flow.



Figure 4.7: Area-speed results for Huffman in UMC-0.13

| Design              | Frequency(ns) | ${\rm Area}(\mu m^2)$ | % of max area |
|---------------------|---------------|-----------------------|---------------|
| Huffman(TYP_100)    | 1.7           | 24922                 | 79.7938%      |
| $Huffman(TYP_75)$   | 2.3           | 23461                 | 75.1161%      |
| Huffman(TYP_ $50$ ) | 3.4           | 22819                 | 73.0605%      |
| Huffman(TYP_25)     | 6.8           | 22818                 | 73.0573%      |
| $Huffman(WC_100)$   | 2.3           | 31233                 | 100%          |
| $Huffman(WC_75)$    | 3.1           | 26538                 | 84.9678%      |
| $Huffman(WC_50)$    | 4.6           | 23264                 | 74.4853%      |
| $Huffman(WC_25)$    | 9.2           | 23032                 | 73.7425%      |
| Huffman(PKS-R)      | 2             | 30343                 | 97.1504%      |

Table 4.7: Area-speed results for Huffman in UMC-0.13



Figure 4.8: Area-speed results for Huffman in IHP-0.25

| Design            | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-------------------|---------------|--------------------------------|---------------|
| Huffman(TYP_100)  | 4.7           | 407225                         | 81.0382%      |
| $Huffman(TYP_75)$ | 6.2           | 320673                         | 63.8143%      |
| $Huffman(TYP_50)$ | 9.4           | 298817                         | 59.4649%      |
| $Huffman(TYP_25)$ | 18.8          | 287239                         | 57.1609%      |
| Huffman(WC_100)   | 7.3           | 421098                         | 83.7989%      |
| $Huffman(WC_75)$  | 9.7           | 339950                         | 67.6504%      |
| Huffman(WC_50)    | 14.6          | 312975                         | 62.2823%      |
| Huffman(WC_25)    | 29.2          | 289839                         | 57.6783%      |
| Huffman(PKS-R)    | 6.7           | 502510                         | 100%          |

Table 4.8: Area-speed results for Huffman in IHP-0.25

Figure 4.11 shows the area-speed tradeoff for Reed-Solomon in UMC-0.13 and Figure 4.12 shows the area-speed tradeoff for Reed-Solomon in IHP-0.25. Table 4.11 shows the data that that has been plotted in Figure 4.11 and Table 4.12 shows the data that has been plotted in Figure 4.12 for Reed-Solomon. For Reed-Solomon, the PKS-R flow manages to produce better results than the SYN-P&R flow, which is more evident for IHP-0.25. Compared to the worst-case corner of SYN-P&R, the PKS-R implementation is 10% smaller and about 25% faster.



Figure 4.9: Area-speed results for RISC in UMC-0.13.



Figure 4.10: Area-speed results for RISC in IHP-0.25.

| Design                                          | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-------------------------------------------------|---------------|--------------------------------|---------------|
| $Risc(TYP_100)$                                 | 2.8           | 270160                         | 95.3887%      |
| $\operatorname{Risc}(\operatorname{TYP}_{-75})$ | 3.7           | 236165                         | 83.3857%      |
| $\operatorname{Risc}(\operatorname{TYP}_{50})$  | 5.6           | 228438                         | 80.6574%      |
| $\operatorname{Risc}(\operatorname{TYP}_{25})$  | 11.2          | 191858                         | 67.7417%      |
| $\operatorname{Risc}(WC_100)$                   | 4.3           | 283220                         | 100%          |
| $\operatorname{Risc}(WC_{75})$                  | 5.7           | 240551                         | 84.9343%      |
| $\operatorname{Risc}(WC_50)$                    | 8.6           | 227373                         | 80.2814%      |
| $\operatorname{Risc}(WC_25)$                    | 17.2          | 216029                         | 76.276%       |
| $\operatorname{Risc}(\operatorname{PKS-R})$     | 3.9           | 252963                         | 89.3168%      |

Table 4.9: Area-speed results for RISC in UMC-0.13

| Design                                          | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-------------------------------------------------|---------------|--------------------------------|---------------|
| $\operatorname{Risc}(\operatorname{TYP}_{100})$ | 6.8           | 3631448                        | 98.0336%      |
| $\operatorname{Risc}(\operatorname{TYP}_{75})$  | 9             | 3346768                        | 90.3485%      |
| $\operatorname{Risc}(\operatorname{TYP}_{50})$  | 13.6          | 2958028                        | 79.8541%      |
| $\operatorname{Risc}(\operatorname{TYP}_{25})$  | 27.2          | 2757076                        | 74.4293%      |
| $\operatorname{Risc}(WC_{100})$                 | 13.1          | 3704289                        | 100%          |
| $\operatorname{Risc}(WC_75)$                    | 17.5          | 2906536                        | 78.4641%      |
| $\operatorname{Risc}(WC_50)$                    | 26.2          | 2746087                        | 74.1326%      |
| $\operatorname{Risc}(WC_{25})$                  | 52.4          | 2725226                        | 73.5695%      |
| $\operatorname{Risc}(\operatorname{PKS-R})$     | 11.6          | 3672886                        | 99.1523%      |

Table 4.10: Area-speed results for RISC in IHP-0.25

| Design                                                | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-------------------------------------------------------|---------------|--------------------------------|---------------|
| Reed-Solomon(TYP_100)                                 | 1.4           | 45876                          | 97.4696%      |
| Reed-Solomon(TYP_75)                                  | 1.9           | 41351                          | 87.8556%      |
| $\operatorname{Reed-Solomon}(\operatorname{TYP\_50})$ | 2.8           | 34594                          | 73.4995%      |
| Reed-Solomon(TYP_25)                                  | 5.6           | 32534                          | 69.1227%      |
| $\operatorname{Reed-Solomon}(\mathrm{WC\_100})$       | 2.3           | 47067                          | 100%          |
| Reed-Solomon(WC_75)                                   | 3             | 41605                          | 88.3953%      |
| Reed-Solomon(WC_50)                                   | 4.6           | 33711                          | 71.6234%      |
| Reed-Solomon(WC_25)                                   | 9.2           | 32571                          | 69.2014%      |
| Reed-Solomon(PKS-R)                                   | 2.1           | 46408                          | 98.5999%      |

Table 4.11: Area-speed results for Reed-Solomon in UMC-0.13

Figure 4.13 shows the area-speed tradeoff for VGA-LCD in UMC-0.13 and Figure 4.14 shows the area-speed tradeoff for VGA-LCD in IHP-0.25. Table 4.13 shows the data that that has been plotted in Figure 4.13 and Table 4.14 shows the data that has been plotted in Figure 4.14 for VGA-LCD. VGA-LCD follows the general observation regarding the maximum speed obtained with the PKS-R flow, which is



Figure 4.11: Area-speed results for Reed-Solomon in UMC-0.13.



Figure 4.12: Area-speed results for Reed-Solomon in IHP-0.25.

larger than the speed of worst-case corner of SYN-P&R flow. However, for both libraries, the PKS-R implementation is larger than the worst-case corner implemen-

| Design                                          | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-------------------------------------------------|---------------|--------------------------------|---------------|
| Reed-Solomon(TYP_100)                           | 4.1           | 719245                         | 99.2404%      |
| Reed-Solomon(TYP_ $75$ )                        | 5.4           | 716848                         | 98.9097%      |
| $Reed-Solomon(TYP_50)$                          | 8.2           | 578825                         | 79.8655%      |
| Reed-Solomon(TYP_25)                            | 16.4          | 568810                         | 78.4836%      |
| $\operatorname{Reed-Solomon}(\mathrm{WC\_100})$ | 8.4           | 724750                         | 100%          |
| Reed-Solomon(WC_75)                             | 11.2          | 712014                         | 98.2427%      |
| Reed-Solomon(WC_50)                             | 16.8          | 620039                         | 85.5521%      |
| $Reed-Solomon(WC_25)$                           | 33.6          | 572426                         | 78.9825%      |
| Reed-Solomon(PKS-R)                             | 6.2           | 654875                         | 90.3587%      |

Table 4.12: Area-speed results for Reed-Solomon in IHP-0.25

tation, which is clear for IHP-0.25 where the PKS-R implementation is about 20% larger.



Figure 4.13: Area-speed results for VGA-LCD in UMC-0.13.



Figure 4.14: Area-speed results for VGA-LCD in IHP-0.25.

| Design            | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|-------------------|---------------|--------------------------------|---------------|
| Vga-Lcd(TYP_100)  | 2.3           | 1040976                        | 91.819%       |
| Vga-Lcd(TYP_75)   | 3.1           | 1001407                        | 88.3288%      |
| $Vga-Lcd(TYP_50)$ | 4.6           | 966591                         | 85.2579%      |
| Vga-Lcd(TYP_25)   | 9.2           | 961804                         | 84.8357%      |
| $Vga-Lcd(WC_100)$ | 3.7           | 1035979                        | 91.3783%      |
| Vga-Lcd(WC_75)    | 4.9           | 1002170                        | 88.3961%      |
| $Vga-Lcd(WC_50)$  | 7.4           | 976821                         | 86.1602%      |
| $Vga-Lcd(WC_25)$  | 14.8          | 977476                         | 86.218%       |
| Vga-Lcd(PKS-R)    | 3.5           | 1133726                        | 100%          |

Table 4.13: Area-speed results for VGA-LCD in UMC-0.13

## 4.2 Pipeline Balance Results

In this section, the pipeline balance results are presented. The three designs that are not pipelined (Huffman, Reed-Solomon and VGA-LCD), are omitted from this experiment. DES3 is also excluded from the discussion. Although DES3 has a 48stage pipeline, all of the pipeline stages are identical. The delay of all of the pipeline stages is the same, so this design is not discussed in this section.

| Design           | Frequency(ns) | $\operatorname{Area}(\mu m^2)$ | % of max area |
|------------------|---------------|--------------------------------|---------------|
| Vga-Lcd(TYP_100) | 6.3           | 10389234                       | 81.4662%      |
| Vga-Lcd(TYP_75)  | 8.4           | 10258421                       | 80.4405%      |
| Vga-Lcd(TYP_50)  | 12.6          | 9729179                        | 76.2905%      |
| Vga-Lcd(TYP_25)  | 25.2          | 9494042                        | 74.4467%      |
| Vga-Lcd(WC_100)  | 12.8          | 10501704                       | 82.3482%      |
| $Vga-Lcd(WC_75)$ | 17.1          | 10247960                       | 80.3584%      |
| $Vga-Lcd(WC_50)$ | 25.6          | 9883656                        | 77.5018%      |
| $Vga-Lcd(WC_25)$ | 51.2          | 9673970                        | 75.8576%      |
| Vga-Lcd(PKS-R)   | 12.2          | 12752810                       | 100%          |

Table 4.14: Area-speed results for VGA-LCD in IHP-0.25

## 4.2.1 Post-Synthesis Results

#### SYN-P&R Flow, TYP

Tables 4.15 and 4.16 show the pipeline balance results post-synthesis for the IHP-0.25 and UMC-0.13 libraries respectively.

| Design | Stage                                       | 100%                           | 75%              | 50%                  | 25%                           |
|--------|---------------------------------------------|--------------------------------|------------------|----------------------|-------------------------------|
| DLX    | Period                                      | 4.70ns                         | 6.30ns           | 9.40ns               | 18.80ns                       |
|        | Inputs $\rightarrow$ IF                     | 0.23ns (4.89%)                 | 0.23ns (3.65%)   | 0.23ns (2.45%)       | 0.23ns (1.22%)                |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 4.49ns (95.53%)                | 5.96ns (94.60%)  | 8.98ns ( $95.53%$ )  | 11.33ns $(60.27\%)$           |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 4.42ns (94.04%)                | 6.08ns (96.51%)  | 9.12 ns (97.02%)     | 15.38ns (81.81%)              |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 2.11ns (44.89%)                | 2.11ns (33.49%)  | 2.56ns (27.23%)      | 2.11ns (11.22%)               |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 0.49ns (10.43%)                | 0.46ns (7.30%)   | 0.47 ns (5%)         | $0.47 \mathrm{ns} \ (2.50\%)$ |
| AeMB   | Period                                      | 5.60ns                         | 7.50ns           | 11.20ns              | 22.40ns                       |
|        | Inputs $\rightarrow$ IF                     | 1.56ns (27.86%)                | 1.70ns (22.67%)  | 2.53ns (22.59%)      | 2.53ns (11.29%)               |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 3.77ns (67.32%)                | 6.07 ns (80.93%) | 7.41ns (66.16%)      | 7.63ns (34.06%)               |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 5.32ns (95%)                   | 7.11ns (94.80%)  | 10.66ns (95.18%)     | 13.94ns (62.23%)              |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | $1.67 \mathrm{ns} \ (29.82\%)$ | 1.59ns (21.20%)  | 1.36ns (12.14%)      | 1.36ns~(6.07%)                |
| Risc   | Period                                      | 5.80ns                         | 7.70ns           | 11.60ns              | 23.20ns                       |
|        | Inputs $\rightarrow$ Stage 1                | 3.85ns (66.38%)                | 4.59ns (59.61%)  | 5.27 ns (45.43%)     | 5.22ns (22.50%)               |
|        | Stage $1 \rightarrow$ Stage $2$             | 5.50ns (94.83%)                | 7.32ns (95.06%)  | 10.98ns ( $94.66%$ ) | 18.04ns (77.76%)              |
|        | Stage 2 $\rightarrow$ Stage 3               | 5.49ns (94.66%)                | 7.31ns (94.94%)  | 10.98ns ( $94.66%$ ) | 18.04ns (77.76%)              |
|        | Stage $3 \rightarrow$ Stage $4$             | 4.69ns (80.86%)                | 6.71ns (87.14%)  | 7.29ns ( $62.84%$ )  | 7.37ns (31.77%)               |
|        | Stage $4 \rightarrow \text{Outputs}$        | 1.31ns (22.59%)                | 1.31ns (17.01%)  | 1.30ns (11.21%)      | 1.30ns (5.60%)                |

Table 4.15: Pipeline balance results post-synthesis for the typical corner of IHP-0.25

In the IHP-0.25 implementations, DLX and RISC have two pipeline stages which are the slowest, while the other stages have a delay which is from 20% to 80% less than the delay of the slowest stages. In UMC-013, DLX's results are similar, whereas RISC has three pipeline stages with identical delay. Aemb has one stage in UMC-

| Design | Stage                                       | 100%                | 75%             | 50%             | 25%                            |
|--------|---------------------------------------------|---------------------|-----------------|-----------------|--------------------------------|
| DLX    | Period                                      | 1.80ns              | 2.40ns          | 3.60ns          | 7.20ns                         |
|        | Inputs $\rightarrow$ IF                     | $0.12ns \ (6.67\%)$ | 0.15ns~(6.25%)  | 0.15ns (4.17%)  | 0.16ns (2.22%)                 |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 1.68ns (93.33%)     | 2.28ns (95.00%) | 3.36ns (93.33%) | 5.28ns (73.33%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 1.68ns (93.33%)     | 2.35ns (97.92%) | 3.48ns (96.67%) | 5.48ns (76.11%)                |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 1.19ns (66.11%)     | 1.19ns (49.58%) | 1.19ns (33.06%) | $1.47 \mathrm{ns} \ (20.42\%)$ |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 0.42ns (23.33%)     | 0.18ns (7.50%)  | 0.42ns (11.67%) | 0.28ns (3.89%)                 |
| AeMB   | Period                                      | 2.40ns              | 3.20ns          | 4.80ns          | 9.60ns                         |
|        | Inputs $\rightarrow$ IF                     | 1.46ns (60.83%)     | 1.14ns (35.62%) | 1.14ns (23.75%) | 1.14ns (11.87%)                |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 2.25ns (93.75%)     | 2.51ns (78.44%) | 2.44ns (50.83%) | 2.65 ns (27.60%)               |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 2.29ns (95.42%)     | 3.05ns (95.31%) | 3.98ns (82.92%) | 5.95ns ( $61.98%$ )            |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 0.78ns ( $32.50%$ ) | 0.78ns (24.38%) | 0.78ns (16.25%) | $0.78ns \ (8.12\%)$            |
| Risc   | Period                                      | 2.40ns              | 3.20ns          | 4.80ns          | 9.60ns                         |
|        | Inputs $\rightarrow$ Stage 1                | 1.76ns (73.33%)     | 1.89ns (59.06%) | 2.30ns (47.92%) | $2.57 \mathrm{ns} \ (26.77\%)$ |
|        | Stage $1 \rightarrow$ Stage $2$             | 2.17 ns (90.42%)    | 2.99ns (93.44%) | 4.50ns (93.75%) | 5.30ns ( $55.21%$ )            |
|        | Stage 2 $\rightarrow$ Stage 3               | 2.17 ns (90.42%)    | 2.98ns (93.12%) | 4.50ns (93.75%) | 5.30ns ( $55.21%$ )            |
|        | Stage $3 \rightarrow$ Stage $4$             | 2.16ns (90%)        | 2.48ns (77.50%) | 2.89ns (60.21%) | 3.16ns (32.92%)                |
|        | Stage $4 \rightarrow \text{Outputs}$        | 0.52 ns (21.67%)    | 0.42ns (13.12%) | 0.49ns (10.21%) | 0.49ns (5.10%)                 |

Table 4.16: Pipeline balance results post-synthesis for the typical corner of UMC-0.13

0.13 and two stages in IHP-0.25 with a delay within 90% of the period. As is shown in tables 4.15 and 4.16, the delay of the most critical pipeline stages is not the same as the period. This is caused by the fact that the delay of the registers, which separate the pipeline stages is not taken into account by the STA. This is observed in all measurements.

#### SYN-P&R Flow, WC

Table 4.17 shows the critical path results post-synthesis in IHP-0.25. Table 4.18 shows the pipeline balance results post-synthesis in UMC-0.13.

The worst-case corner results for SYN-P&R flow are similar to the typical corner results of SYN-P&R flow for DLX and RISC. Aemb is more balanced in the worst-case corner than in the typical corner for IHP-0.25

| Design | Stage                                       | 100%                           | 75%                            | 50%                  | 25%                            |
|--------|---------------------------------------------|--------------------------------|--------------------------------|----------------------|--------------------------------|
| DLX    | Period                                      | 10.20ns                        | 13.60ns                        | 20.40ns              | 40.80ns                        |
|        | Inputs $\rightarrow$ IF                     | $0.50 \mathrm{ns} \ (4.90\%)$  | 0.50 ns (3.68%)                | 0.50ns $(2.45%)$     | 0.49ns (1.20%)                 |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 9.49ns (93.04%)                | 12.74ns ( $93.68%$ )           | 19.75ns ( $96.81%$ ) | 25.04ns ( $61.37%$ )           |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 9.65 ns (94.61%)               | 13.18ns (96.91%)               | 19.86<br>ns (97.35%) | 32.98ns (80.83%)               |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 4.27ns (41.86%)                | 4.27ns (31.40%)                | 4.27 ns (20.93%)     | $4.27 \mathrm{ns} \ (10.47\%)$ |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 0.99ns (9.71%)                 | 1ns (7.35%)                    | 1ns (4.90%)          | 1ns (2.45%)                    |
| AeMB   | Period                                      | 11ns                           | 14.70ns                        | 22ns                 | 44ns                           |
|        | Inputs $\rightarrow$ IF                     | 10.20ns ( $92.73%$ )           | 10.08ns ( $68.57%$ )           | 14.40ns (65.45%)     | 16.45ns $(37.39%)$             |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 10.20ns ( $92.73%$ )           | 13.39ns (91.09%)               | 17.01 ns (77.32%)    | 19.06ns (43.32%)               |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 10.02ns ( $91.09%$ )           | 14.01 ns (95.31%)              | 10.30ns ( $46.82%$ ) | 10.63ns (24.16%)               |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 3.53ns (32.09%)                | 3.11ns (21.16%)                | 3.09ns~(14.05%)      | 3.09ns (7.02%)                 |
| Risc   | Period                                      | 11ns                           | 14.70ns                        | 22ns                 | 44ns                           |
|        | Inputs $\rightarrow$ Stage 1                | 8.52ns (77.45%)                | 9.20ns (62.59%)                | 12.70ns (57.73%)     | 11.92ns (27.09%)               |
|        | Stage $1 \rightarrow$ Stage $2$             | 10.40ns ( $94.55%$ )           | 13.83ns (94.08%)               | 21.27 ns (96.68%)    | 35.82ns (81.41%)               |
|        | Stage 2 $\rightarrow$ Stage 3               | 10.40ns ( $94.55%$ )           | 13.83ns (94.08%)               | 21.27ns (96.68%)     | 35.82ns (81.41%)               |
|        | Stage $3 \rightarrow$ Stage $4$             | 9.81ns (89.18%)                | 12.70ns (86.39%)               | 12.38ns (56.27%)     | 15.37ns (34.93%)               |
|        | Stage $4 \rightarrow \text{Outputs}$        | $2.87 \mathrm{ns} \ (26.09\%)$ | $2.87 \mathrm{ns} \ (19.52\%)$ | 2.86ns (13%)         | 2.86ns (6.50%)                 |

Table 4.17: Pipeline balance results post-synthesis for the worst corner of IHP-0.25

| Design | Stage                                       | 100%                           | 75%                            | 50%                          | 25%                             |
|--------|---------------------------------------------|--------------------------------|--------------------------------|------------------------------|---------------------------------|
| DLX    | Period                                      | 10.20ns                        | 13.60ns                        | 20.40ns                      | 40.80ns                         |
|        | Inputs $\rightarrow$ IF                     | $0.50 \mathrm{ns} \ (4.90\%)$  | $0.50 \mathrm{ns} \ (3.68\%)$  | 0.50 ns (2.45%)              | 0.49ns (1.20%)                  |
|        | $\text{IF}{\rightarrow}\text{ID}$           | $9.49 \mathrm{ns} \ (93.04\%)$ | 12.74ns (93.68%)               | 19.75 ns (96.81%)            | 25.04ns (61.37%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 9.65 ns (94.61%)               | 13.18ns (96.91%)               | 19.86ns (97.35%)             | 32.98ns ( $80.83%$ )            |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | $4.27 \mathrm{ns} \ (41.86\%)$ | 4.27 ns (31.40%)               | $4.27 \mathrm{ns}~(20.93\%)$ | 4.27ns (10.47%)                 |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 0.99ns (9.71%)                 | 1ns (7.35%)                    | 1ns (4.90%)                  | 1ns (2.45%)                     |
| AeMB   | Period                                      | 11ns                           | 14.70ns                        | 22ns                         | 44ns                            |
|        | Inputs $\rightarrow$ IF                     | 10.20ns ( $92.73%$ )           | 10.08ns ( $68.57%$ )           | 14.40ns (65.45%)             | 16.45ns (37.39%)                |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 10.20ns ( $92.73%$ )           | 13.39ns ( $91.09%$ )           | 17.01ns (77.32%)             | 19.06ns (43.32%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 10.02ns ( $91.09%$ )           | 14.01ns (95.31%)               | 10.30ns ( $46.82%$ )         | 10.63ns (24.16%)                |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 3.53ns (32.09%)                | 3.11 ns (21.16%)               | 3.09ns~(14.05%)              | 3.09ns (7.02%)                  |
| Risc   | Period                                      | 11ns                           | 14.70ns                        | 22ns                         | 44ns                            |
|        | Inputs $\rightarrow$ Stage 1                | 8.52ns (77.45%)                | $9.20 \mathrm{ns}~(62.59\%)$   | 12.70ns (57.73%)             | 11.92ns (27.09%)                |
|        | Stage $1 \rightarrow$ Stage $2$             | 10.40ns ( $94.55%$ )           | 13.83ns (94.08%)               | 21.27 ns (96.68%)            | 35.82ns (81.41%)                |
|        | Stage 2 $\rightarrow$ Stage 3               | 10.40ns ( $94.55%$ )           | 13.83ns (94.08%)               | 21.27 ns (96.68%)            | 35.82ns (81.41%)                |
|        | Stage $3 \rightarrow$ Stage $4$             | 9.81 ns (89.18%)               | 12.70ns (86.39%)               | 12.38ns (56.27%)             | $15.37 \mathrm{ns} \ (34.93\%)$ |
|        | Stage 4 $\rightarrow$ Outputs               | $2.87 \mathrm{ns} \ (26.09\%)$ | $2.87 \mathrm{ns} \ (19.52\%)$ | 2.86ns (13%)                 | 2.86ns $(6.50%)$                |

Table 4.18: Pipeline balance results post-synthesis for the worst corner of UMC-0.13

## 4.2.2 Post-Place and Route Results

## Typical corner of SYN-P&R flow

Table 4.19 shows the critical path results post-P&R in IHP-0.25. Table 4.20 shows the pipeline balance results post-P&R in UMC-0.13.

| Design | Stage                                       | 100%            | 75%                 | 50%                  | 25%                  |
|--------|---------------------------------------------|-----------------|---------------------|----------------------|----------------------|
| DLX    | Period                                      | 5.30ns          | 7.10ns              | 10.60ns              | 21.20ns              |
|        | Inputs $\rightarrow$ IF                     | 4.50ns (84.91%) | 5.30ns (74.65%)     | 9ns (84.91%)         | 10.20ns (48.11%)     |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 5.10ns (96.23%) | 6.40ns (90.14%)     | 9.90ns (93.40%)      | 12.80ns (60.38%)     |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 5.20ns (98.11%) | 6.60ns ( $92.96%$ ) | 10ns (94.34%)        | 14.50ns (68.40%)     |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 3.10ns (58.49%) | 3.40ns (47.89%)     | 3.90ns (36.79%)      | 3.20ns~(15.09%)      |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 4ns (75.47%)    | 4.90ns (69.01%)     | 7ns (66.04%)         | 6.40ns (30.19%)      |
| AeMB   | Period                                      | 7.20ns          | 9.50ns              | 14.40ns              | 28.80ns              |
|        | Inputs $\rightarrow$ IF                     | 5.20ns (72.22%) | 7.60ns (80%)        | 8.90ns (61.81%)      | 9.30ns (32.29%)      |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 1.90ns (26.39%) | 1.90ns (20%)        | 1.70ns (11.81%)      | 1.70ns (5.90%)       |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 6ns (83.33%)    | 6.90ns (72.63%)     | 11.60ns ( $80.56%$ ) | 14.50ns ( $50.35%$ ) |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 5.70ns (79.17%) | 6.50ns ( $68.42%$ ) | 11.10ns (77.08%)     | 14ns (48.61%)        |
| Risc   | Period                                      | 6.80ns          | 9ns                 | 13.60ns              | 27.20ns              |
|        | Inputs $\rightarrow$ Stage 1                | 6.80ns (100%)   | 8.30ns (92.22%)     | 10.90ns ( $80.15%$ ) | 18.30ns (67.28%)     |
|        | Stage $1 \rightarrow$ Stage $2$             | 6.40ns (94.12%) | 7.40ns (82.22%)     | 10ns (73.53%)        | 10.80ns (39.71%)     |
|        | Stage 2 $\rightarrow$ Stage 3               | 5.70ns (83.82%) | 6.60ns (73.33%)     | 9ns~(66.18%)         | 15.80ns ( $58.09%$ ) |
|        | Stage $3 \rightarrow$ Stage $4$             | 2.70ns (39.71%) | 3.80ns (42.22%)     | 4.60ns (33.82%)      | 4.70ns (17.28%)      |
|        | Stage $4 \rightarrow \text{Outputs}$        | 6.20ns (91.18%) | 6.90ns (76.67%)     | 7.20ns (52.94%)      | 9.20ns (33.82%)      |

Table 4.19: Pipeline balance results post-P&R for the typical corner of IHP-0.25

| Design | Stage                                       | 100%             | 75%              | 50%              | 25%              |
|--------|---------------------------------------------|------------------|------------------|------------------|------------------|
| DLX    | Period                                      | 2.50ns           | 3.30ns           | 5ns              | 10ns             |
|        | Inputs $\rightarrow$ IF                     | 2.43ns (97.20%)  | 1.93ns (58.48%)  | 2.44ns (48.80%)  | 2.33ns (23.30%)  |
|        | $\text{IF} \rightarrow \text{ID}$           | 2.35ns (94%)     | 2.70ns (81.82%)  | 3.26ns (65.20%)  | 5.51ns (55.10%)  |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 2.29ns (91.60%)  | 2.39ns (72.42%)  | 3.64ns (72.80%)  | 5.88ns (58.80%)  |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 1.05ns (42%)     | 1.06ns (32.12%)  | 1.24ns (24.80%)  | 1.16ns (11.60%)  |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 1.43ns (57.20%)  | 1.41ns (42.73%)  | 1.67 ns (33.40%) | 1.69ns (16.90%)  |
| AeMB   | Period                                      | 2.80ns           | 3.70ns           | 5.60ns           | 11.20ns          |
|        | Inputs $\rightarrow$ IF                     | 2.56ns (91.43%)  | 2.91ns (78.65%)  | 3.01ns (53.75%)  | 3.12ns (27.86%)  |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 0.80ns (28.57%)  | 0.52 ns (14.05%) | 0.78ns (13.93%)  | 0.81ns $(7.23%)$ |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 2.26ns (80.71%)  | 2.81ns (75.95%)  | 3.75ns (66.96%)  | 6.10ns (54.46%)  |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 2.17ns (77.50%)  | 2.69ns (72.70%)  | 3.62ns (64.64%)  | 5.96ns (53.21%)  |
| Risc   | Period                                      | 2.80ns           | 3.70ns           | 5.60ns           | 11.20ns          |
|        | Inputs $\rightarrow$ Stage 1                | 2.67 ns (95.36%) | 3.35ns (90.54%)  | 4.54ns (81.07%)  | 4.71ns (42.05%)  |
|        | Stage $1 \rightarrow$ Stage $2$             | 2.52ns (90%)     | 3.17ns (85.68%)  | 4.27ns (76.25%)  | 3.54ns (31.61%)  |
|        | Stage 2 $\rightarrow$ Stage 3               | 2.03ns (72.50%)  | 2.29ns (61.89%)  | 2.31ns (41.25%)  | 3.16ns (28.21%)  |
|        | Stage $3 \rightarrow$ Stage $4$             | 1.02ns (36.43%)  | 1.24ns (33.51%)  | 1.06ns (18.93%)  | 1.18ns (10.54%)  |
|        | Stage $4 \rightarrow \text{Outputs}$        | 2.52ns (90%)     | 2.57ns (69.46%)  | 2.62ns (46.79%)  | 3.53ns (31.52%)  |

Table 4.20: Pipeline balance results post-P&R for the typical corner of UMC-0.13

Aemb's pipeline stages are fairly well balanced, as three out of four stages have approximately the same delay, which is different from the post-synthesis results. RISC and DLX are also better balanced according to the post-P&R results than the post-synthesis results, which is mainly due to the fact that the first pipeline stages becomes critical post-P&R. This can be explained by a poor placement of the input / output pins of the design on the layout.

#### SYN-P&R flow WC

Table 4.21 shows the critical path results post-P&R in IHP-0.25 and Table 4.22 shows the pipeline balance results post-P&R in UMC-0.13.

| Design | Stage                                       | 100%                           | 75%                  | 50%                            | 25%                  |
|--------|---------------------------------------------|--------------------------------|----------------------|--------------------------------|----------------------|
| DLX    | Period                                      | 11.80ns                        | 15.70ns              | 23.60ns                        | 47.20ns              |
|        | Inputs $\rightarrow$ IF                     | 9.80 ns (83.05%)               | 11.60ns (73.89%)     | 17.90ns (75.85%)               | 22.40ns (47.46%)     |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 11ns (93.22%)                  | 13.60ns ( $86.62%$ ) | 22ns (93.22%)                  | 28.10ns (59.53%)     |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 10.90ns ( $92.37%$ )           | 14.30ns ( $91.08%$ ) | 22.40ns ( $94.92%$ )           | 30.20ns (63.98%)     |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | $6.70 \mathrm{ns} \ (56.78\%)$ | 7.10ns (45.22%)      | 6.80ns (28.81%)                | 6.30ns (13.35%)      |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | $7.90 \mathrm{ns}~(66.95\%)$   | 9ns (57.32%)         | 9.60 ns (40.68%)               | 10.30ns ( $21.82%$ ) |
| AeMB   | Period                                      | 12.90ns                        | 17.30ns              | 25.80ns                        | 51.60ns              |
|        | Inputs $\rightarrow$ IF                     | 6.15ns (47.67%)                | 7.65ns (44.22%)      | 9.87 ns (38.26%)               | 11.43ns (22.15%)     |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 11.38ns (88.22%)               | 8.37 ns (48.38%)     | 14.23ns (55.16%)               | 13.36ns (25.89%)     |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 11.38ns (88.22%)               | 12.50ns (72.25%)     | 8.03ns (31.12%)                | 9.46ns~(18.33%)      |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 5.53ns ( $42.87%$ )            | 6.11ns ( $35.32%$ )  | 5.37 ns (20.81%)               | 4.93ns $(9.55%)$     |
| Risc   | Period                                      | 13.10ns                        | 17.50ns              | 26.20ns                        | 52.40ns              |
|        | Inputs $\rightarrow$ Stage 1                | 13ns (99.24%)                  | 13.30ns (76%)        | 13.10ns (50%)                  | 18.70 ns (35.69%)    |
|        | Stage $1 \rightarrow$ Stage $2$             | 13ns (99.24%)                  | 13.50ns (77.14%)     | 11.20ns (42.75%)               | 10ns (19.08%)        |
|        | Stage 2 $\rightarrow$ Stage 3               | 9.90ns (75.57%)                | 6.50 ns (37.14%)     | 10.10ns (38.55%)               | 14.70 ns (28.05%)    |
|        | Stage $3 \rightarrow$ Stage $4$             | 5.20ns (39.69%)                | 5.30ns ( $30.29%$ )  | $4.70 \mathrm{ns} \ (17.94\%)$ | 4.70ns (8.97%)       |
|        | Stage $4 \rightarrow \text{Outputs}$        | 13ns (99.24%)                  | 9.50 ns (54.29%)     | 8.80ns ( $33.59%$ )            | 9.60 ns (18.32%)     |

Table 4.21: Pipeline balance results post-P&R for the worst corner of IHP-0.25

The post-P&R WC results are similar to the post-P&R typical corner results for DLX and RISC in IHP-0.25, and Aemb and RISC in UMC-0.13. Aemb in IHP-0.25 and DLX in UMC-0.13 are not as well balanced in worst-case corner as they are in typical corner because the first stage is not critical in the worst-case corner, while it is critical in the typical corner. Comparing to the post-synthesis results, DLX and RISC are better balanced post-P&R for IHP-0.25, while in UMC-0.13, only DLX is better balanced post-P&R.

An observation for the SYN-P&R flow is that as the clock period is lowered from 100% to 75% and 50%, the most critical stages seem to increase their delay accordingly, but the delay of the non-critical stages is practically unaffected. For

| Design | Stage                                       | 100%            | 75%             | 50%                 | 25%                            |
|--------|---------------------------------------------|-----------------|-----------------|---------------------|--------------------------------|
| DLX    | Period                                      | 3.60ns          | 4.80ns          | 7.20ns              | 14.40ns                        |
|        | Inputs $\rightarrow$ IF                     | 1.70ns (47.22%) | 1.90ns (39.58%) | 2ns (27.78%)        | 2.30ns (15.97%)                |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 2.30ns (63.89%) | 2.70ns (56.25%) | 3.30ns (45.83%)     | 5.50ns (38.19%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 2.30ns (63.89%) | 2.40ns (50%)    | 3.60ns~(50%)        | 5.90ns ( $40.97%$ )            |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 1.10ns (30.56%) | 1.10ns (22.92%) | 1.20ns (16.67%)     | $1.20ns \ (8.33\%)$            |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 1.40ns (38.89%) | 1.40ns (29.17%) | 1.70ns (23.61%)     | $1.70 \mathrm{ns} \ (11.81\%)$ |
| AeMB   | Period                                      | 4.60ns          | 6.10ns          | 9.20ns              | 18.40ns                        |
|        | Inputs $\rightarrow$ IF                     | 4.40ns (95.65%) | 4.70ns (77.05%) | 5ns (54.35%)        | 5.20ns (28.26%)                |
|        | $\mathrm{IF}{\rightarrow}~\mathrm{ID}$      | 1.20ns (26.09%) | 1.10ns (18.03%) | 1.50ns (16.30%)     | 1.40ns (7.61%)                 |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 3.70ns (80.43%) | 4.90ns (80.33%) | 6.90ns (75%)        | 10.70ns (58.15%)               |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 3.50ns (76.09%) | 4.60ns (75.41%) | 6.40ns ( $69.57%$ ) | 10.10ns (54.89%)               |
| Risc   | Period                                      | 4.30ns          | 5.70ns          | 8.60ns              | 17.20ns                        |
|        | Inputs $\rightarrow$ Stage 1                | 4.10ns (95.35%) | 5.20ns (91.23%) | 7.20ns (83.72%)     | 7ns (40.70%)                   |
|        | Stage $1 \rightarrow$ Stage $2$             | 4ns (93.02%)    | 5.10ns (89.47%) | 6.60ns (76.74%)     | $4.40 \mathrm{ns} \ (25.58\%)$ |
|        | Stage 2 $\rightarrow$ Stage 3               | 3.10ns (72.09%) | 3.80ns (66.67%) | 5.50ns ( $63.95%$ ) | 5.60ns ( $32.56%$ )            |
|        | Stage $3 \rightarrow$ Stage $4$             | 1.50ns (34.88%) | 2ns (35.09%)    | 1.80ns (20.93%)     | 1.10ns (6.40%)                 |
|        | Stage 4 $\rightarrow$ Outputs               | 3.70ns (86.05%) | 4.30ns (75.44%) | 4.90ns (56.98%)     | 2.70ns (15.70%)                |

Table 4.22: Pipeline balance results post-P&R for the worst corner of UMC-0.13

the 25% clock period measurement, it seems that the delay of all of the pipeline stages is not increased as much as the clock period.

#### **PKS-R** flow Results

Table 4.23 shows the pipeline balance results in IHP-0.25 and table 4.24 shows the pipeline balance results in UMC-0.13.

For DLX and RISC in IHP-0.25, the PKS-R flow results are similar to the post-P&R worst-case corner results, while Aemb is a little more balanced with PKS-R flow. For UMC-0.13, only RISC's results are similar in both flows. Aemb has three pipeline stages in both flows which are the most critical, but the most critical stage is different. DLX is not as well balanced with PKS-R flow as is in SYN-P&R flow. With PKS-R flow, there are three stages with identical delay, and two stages with about half the delay of the most critical stages.

| Design | Stage                                       | 100%                           |
|--------|---------------------------------------------|--------------------------------|
| DLX    | Period                                      | 9.20ns                         |
|        | Inputs $\rightarrow$ IF                     | 7.90ns (85.87%)                |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 8.60ns (93.48%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 8.80ns ( $95.65%$ )            |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 5.60 ns (60.87%)               |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | $6.50 \mathrm{ns}$ (70.65%)    |
| AeMB   | Period                                      | 11.70ns                        |
|        | Inputs $\rightarrow$ IF                     | 8.50 ns (72.65%)               |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 4.40ns (37.61%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 10.70ns ( $91.45%$ )           |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | $9.70 \mathrm{ns} \ (82.91\%)$ |
| Risc   | Period                                      | 11.60ns                        |
|        | Inputs $\rightarrow$ Stage 1                | 11.50ns (99.14%)               |
|        | Stage $1 \rightarrow$ Stage $2$             | 10.60ns ( $91.38%$ )           |
|        | Stage 2 $\rightarrow$ Stage 3               | 8.70ns (75.00%)                |
|        | Stage $3 \rightarrow$ Stage $4$             | $5.70 \mathrm{ns} \ (49.14\%)$ |
|        | Stage 4 $\rightarrow$ Outputs               | 11.50ns (99.14%)               |

Table 4.23: Pipeline balance results for PKS-R flow in IHP-0.25

| Design | Stage                                       | 100%                        |
|--------|---------------------------------------------|-----------------------------|
| DLX    | Period                                      | 3.20ns                      |
|        | Inputs $\rightarrow$ IF                     | $3.07 \mathrm{ns}~(95.9\%)$ |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | $3.07 \mathrm{ns}~(95.9\%)$ |
|        | $\mathrm{ID} \to \mathrm{EX}$               | $3.07 \mathrm{ns}~(95.9\%)$ |
|        | $\mathrm{EX} \to \mathrm{MEM}$              | 1.50ns (46.88%)             |
|        | $\mathrm{MEM} \rightarrow \mathrm{outputs}$ | 2ns (62.50%)                |
| AeMB   | Period                                      | 3.80ns                      |
|        | Inputs $\rightarrow$ IF                     | 2.90ns (76.32%)             |
|        | $\mathrm{IF}{\rightarrow}\ \mathrm{ID}$     | 1ns (26.32%)                |
|        | $\mathrm{ID} \to \mathrm{EX}$               | 3.50ns (92.11%)             |
|        | $\mathrm{EX} \rightarrow \mathrm{outputs}$  | 3.30ns (86.84%)             |
| Risc   | Period                                      | 3.90ns                      |
|        | Inputs $\rightarrow$ Stage 1                | 3.84ns (98.5%)              |
|        | Stage $1 \rightarrow$ Stage $2$             | 3.80ns (97.44%)             |
|        | Stage 2 $\rightarrow$ Stage 3               | 3.20ns (82.05%)             |
|        | Stage $3 \rightarrow$ Stage $4$             | 1.70ns (43.59%)             |
|        | Stage $4 \rightarrow \text{Outputs}$        | 3.40ns (87.18%)             |

Table 4.24: Pipeline balance results for PKS-R in UMC-0.13

# 4.3 Critical Path Results

In this section, the results from the critical path analysis are presented. For each design, the number of cells that are in the paths with delay within 30%, 20%,

15%, 10% and 5% of the delay of the most critical path is measured. For every measurement, there is a table which presents the number of cells that are within each path margin and a corresponding graph, where the numbers are plotted.

## 4.3.1 Synthesis Results

### SYN-P&R flow TYP

Figures 4.15 and 4.16 show the path margin results for the typical corner of SYN-P&R flow and Tables 4.25 and 4.26 present the actual data.



Figure 4.15: Numbers of cells in most critical paths post-synthesis TYP IHP

| Design       | 5%             | 10%            | 15%            | 20%            | 30%            | Total |
|--------------|----------------|----------------|----------------|----------------|----------------|-------|
| Aemb         | 1474 (23.64%)  | 2087 (33.47%)  | 2311 (37.06%)  | 2764 (44.33%)  | 2915 (46.75%)  | 6235  |
| DES3         | 13593 (21.04%) | 13593 (21.04%) | 13593 (21.04%) | 13593 (21.04%) | 13593 (21.04%) | 64593 |
| DLX          | 1875 (16.97%)  | 2476 (22.41%)  | 2961 (26.80%)  | 3388~(30.66%)  | 3663 (33.15%)  | 11050 |
| Huffman      | 453 (16.23%)   | 567~(20.32%)   | 620 (22.21%)   | 644 (23.07%)   | 707 (25.33%)   | 2791  |
| RISC         | 403~(2.62%)    | 705 (4.59%)    | 862 (5.61%)    | 1178 (7.67%)   | 1435~(9.34%)   | 15360 |
| Reed-Solomon | 1222 (29.33%)  | 1349 (32.38%)  | 1360 (32.65%)  | 1360 (32.65%)  | 1360 (32.65%)  | 4166  |
| VGA-LCD      | 15595~(54.68%) | 16231 (56.90%) | 16240 (56.94%) | 19540 (68.51%) | 19540 (68.51%) | 28523 |

Table 4.25: Numbers of cells in most critical paths post-synthesis TYP IHP

A comparison between the two libraries, shows that in UMC-0.13, the percentage of cells that are within a certain margin is about the same as in IHP-0.25, or a little



Figure 4.16: Numbers of cells in most critical paths post-synthesis TYP UMC

| Design       | 5%             | 10%                | 15%            | 20%                | 30%                | Total |
|--------------|----------------|--------------------|----------------|--------------------|--------------------|-------|
| Aemb         | 517 (7.43%)    | 2376 (34.14%)      | 2839~(40.80%)  | 3243 (46.60%)      | 3508 (50.41%)      | 6959  |
| DES3         | 14236 (19.94%) | $14236\ (19.94\%)$ | 14236 (19.94%) | $14236\ (19.94\%)$ | $14236\ (19.94\%)$ | 71399 |
| DLX          | 1065~(8.47%)   | 1327~(10.55%)      | 2256~(17.94%)  | 2333~(18.55%)      | 2909~(23.14%)      | 12574 |
| Huffman      | 540~(15.42%)   | 619~(17.67%)       | 659~(18.81%)   | 676 (19.30%)       | 715 (20.41%)       | 3503  |
| RISC         | 525 (3.15%)    | 784 (4.70%)        | 1027~(6.16%)   | 1224 (7.34%)       | 1695~(10.16%)      | 16677 |
| Reed-Solomon | 1143 (31.81%)  | 1155 (32.15%)      | 1155 (32.15%)  | 1155 (32.15%)      | 1155 (32.15%)      | 3593  |
| VGA-LCD      | 16412 (52.13%) | 16412 (52.13%)     | 16412 (52.13%) | 18571 (58.99%)     | 18571 (58.99%)     | 31482 |

Table 4.26: Numbers of cells in most critical paths post-synthesis TYP UMC

lower (no more than 10% lower).

### SYN-P&R flow WC

As is shown in Figures 4.17 and 4.18 and Tables 4.27 and 4.28, there is a drop in the percentage of cells which are in a path margin when moving from IHP-0.25 to UMC-0.13, with the exception of DES3. Comparing between the operating conditions, there seems to be no significant difference between the typical corner and the worst-case corner, with the exception of Aemb, which shows much lower percentages in the UMC-0.13 worst-case corner than in the UMC-0.13 typical corner.



Figure 4.17: Numbers of cells in most critical paths post-synthesis WC IHP



Figure 4.18: Numbers of cells in most critical paths post-synthesis WC UMC

## 4.3.2 Place and Route Results

## SYN-P&R flow TYP

As in the results after synthesis, the results after placement and routing, which are presented in Figures 4.19 and 4.20 and Tables 4.29 and 4.30, show no significant difference in the percentages between the two libraries.

| Design       | 5%             | 10%            | 15%            | 20%            | 30%            | Total |
|--------------|----------------|----------------|----------------|----------------|----------------|-------|
| Aemb         | 1153 (19.54%)  | 2814 (47.69%)  | 2980 (50.50%)  | 3171 (53.74%)  | 3255~(55.16%)  | 5901  |
| DES3         | 13197 (19.40%) | 13197 (19.40%) | 13197 (19.40%) | 13197 (19.40%) | 13197 (19.40%) | 68041 |
| DLX          | 1555~(14.46%)  | 3173 (29.51%)  | 3464 (32.22%)  | 3566 (33.17%)  | 3629~(33.75%)  | 10752 |
| Huffman      | 492 (17.15%)   | 614 (21.41%)   | 641 (22.35%)   | 695 (24.23%)   | 703 (24.51%)   | 2868  |
| RISC         | 771 (4.57%)    | 931 (5.51%)    | 1080 (6.40%)   | 1446 (8.56%)   | 1731 (10.25%)  | 16887 |
| Reed-Solomon | 1127 (27.71%)  | 1326 (32.60%)  | 1355 (33.32%)  | 1355 (33.32%)  | 1355 (33.32%)  | 4067  |
| VGA-LCD      | 16314 (52.16%) | 16314 (52.16%) | 16314 (52.16%) | 18529 (59.24%) | 18529 (59.24%) | 31276 |

Table 4.27: Numbers of cells in most critical paths post-synthesis WC IHP

| Design       | 5%             | 10%            | 15%            | 20%            | 30%            | Total |
|--------------|----------------|----------------|----------------|----------------|----------------|-------|
| Aemb         | 325~(4.74%)    | 362~(5.27%)    | 391 (5.70%)    | 646 (9.41%)    | 1519 (22.13%)  | 6863  |
| DES3         | 15294 (23.39%) | 15294 (23.39%) | 15294 (23.39%) | 15294~(23.39%) | 15294 (23.39%) | 65374 |
| DLX          | 1282 (10.55%)  | 1386 (11.40%)  | 1421 (11.69%)  | 2330~(19.17%)  | 2755 (22.67%)  | 12153 |
| Huffman      | 534 (14.86%)   | 638 (17.75%)   | 676~(18.81%)   | 720 (20.03%)   | 724 (20.14%)   | 3594  |
| RISC         | 700 (3.60%)    | 860 (4.42%)    | 964 (4.95%)    | 1297~(6.66%)   | 1769 (9.09%)   | 19463 |
| Reed-Solomon | 1177 (31.88%)  | 1185 (32.10%)  | 1185 (32.10%)  | 1185 (32.10%)  | 1185 (32.10%)  | 3692  |
| VGA-LCD      | 11854 (40.63%) | 11854 (40.63%) | 11854 (40.63%) | 13821 (47.37%) | 13821 (47.37%) | 29174 |

Table 4.28: Numbers of cells in most critical paths post-synthesis WC UMC



Figure 4.19: Numbers of cells in most critical paths post-P&R TYP IHP

## SYN-P&R flow WC

The worst-case corner results after place and route show the same behaviour as the typical corner results, *i.e.* the percentages are similar in the two libraries, as is



Figure 4.20: Numbers of cells in most critical paths post-P&R TYP UMC

| Design       | 5%             | 10%            | 15%               | 20%            | 30%            | Total |
|--------------|----------------|----------------|-------------------|----------------|----------------|-------|
| Aemb         | 280 (3.89%)    | 328~(4.55%)    | 364~(5.05%)       | 408 (5.67%)    | 475~(6.60%)    | 7202  |
| DES3         | 13595~(19.96%) | 13595~(19.96%) | 13595~(19.96%)    | 13595~(19.96%) | 13595~(19.96%) | 68117 |
| DLX          | 312~(2.36%)    | 831 (6.29%)    | $1218 \ (9.22\%)$ | 1523~(11.52%)  | 2242~(16.96%)  | 13217 |
| Huffman      | 43~(1.35%)     | 57 (1.79%)     | 119 (3.74%)       | 217~(6.83%)    | 370 (11.64%)   | 3178  |
| RISC         | 150~(0.83%)    | 532 (2.95%)    | 1302 (7.22%)      | 1835 (10.18%)  | 1992~(11.05%)  | 18028 |
| Reed-Solomon | 543 (11.43%)   | 1293 (27.22%)  | 1524 (32.08%)     | 1534 (32.29%)  | 1534 (32.29%)  | 4750  |
| VGA-LCD      | 18617~(54.72%) | 18617~(54.72%) | 18640 (54.79%)    | 20439 (60.08%) | 20439 (60.08%) | 34020 |

Table 4.29: Numbers of cells in most critical paths post-P&R TYP IHP

| Design       | 5%             | 10%            | 15%            | 20%            | 30%            | Total |
|--------------|----------------|----------------|----------------|----------------|----------------|-------|
| Aemb         | 231~(3.08%)    | 343~(4.57%)    | 413~(5.51%)    | 457 (6.09%)    | 657~(8.76%)    | 7500  |
| DES3         | 14013 (19.43%) | 14013~(19.43%) | 14013 (19.43%) | 14013 (19.43%) | 14013 (19.43%) | 72131 |
| DLX          | 57~(0.48%)     | 260~(2.18%)    | 435~(3.65%)    | 990 (8.31%)    | 1312 (11.01%)  | 11919 |
| Huffman      | 145~(5.70%)    | 301~(11.82%)   | 364 (14.30%)   | 424 (16.65%)   | 479 (18.81%)   | 2546  |
| RISC         | 43~(0.25%)     | 200~(1.14%)    | 200 (1.14%)    | 372~(2.12%)    | 1401 (7.99%)   | 17533 |
| Reed-Solomon | 754 (19.82%)   | 1180 (31.01%)  | 1210 (31.80%)  | 1210 (31.80%)  | 1210 (31.80%)  | 3805  |
| VGA-LCD      | 18981 (61.04%) | 18981 (61.04%) | 21065 (67.74%) | 21065 (67.74%) | 21065 (67.74%) | 31097 |

Table 4.30: Numbers of cells in most critical paths post-P&R TYP UMC

shown in Figures 4.21 and 4.22 and Tables 4.31 and 4.32.

A comparison between post-synthesis and post-P&R shows that after placement and routing, the percentages are much lower (half in some cases), for DES3 and



Figure 4.21: Numbers of cells in most critical paths post-P&R WC IHP



Figure 4.22: Numbers of cells in most critical paths post-P&R WC UMC

VGA-LCD being the only exceptions, as they show no difference in their percentages.

### **PKS-R** flow Results

The PKS-R flow results are presented in Figures 4.23 and 4.24 and Tables 4.33 and 4.34. There seems to be no significant difference between the two technologies, with the exception of DLX and Huffman, which show lower percentages in UMC-0.13.

| Design       | 5%             | 10%            | 15%            | 20%            | 30%            | Total |
|--------------|----------------|----------------|----------------|----------------|----------------|-------|
| Aemb         | 216 (3.05%)    | 324~(4.58%)    | 390~(5.51%)    | 454~(6.42%)    | 1168~(16.51%)  | 7076  |
| DES3         | 13893 (18.58%) | 13907 (18.60%) | 13907 (18.60%) | 13907 (18.60%) | 13907~(18.60%) | 74775 |
| DLX          | 240 (1.95%)    | 442 (3.58%)    | 1042 (8.45%)   | 1563~(12.67%)  | 2597 (21.06%)  | 12332 |
| Huffman      | 365~(11.66%)   | 448 (14.31%)   | 545 (17.41%)   | 616 (19.68%)   | 667 (21.31%)   | 3130  |
| RISC         | 106~(0.53%)    | 262~(1.32%)    | 567 (2.85%)    | 1358~(6.82%)   | 1970 (9.90%)   | 19905 |
| Reed-Solomon | 724 (15.43%)   | 1420 (30.26%)  | 1512 (32.23%)  | 1512 (32.23%)  | 1512 (32.23%)  | 4692  |
| VGA-LCD      | 19043 (55.03%) | 19043 (55.03%) | 19043 (55.03%) | 20512 (59.27%) | 21043 (60.81%) | 34605 |

Table 4.31: Numbers of cells in most critical paths post-P&R WC IHP

| Design       | 5%             | 10%            | 15%            | 20%            | 30%            | Total |
|--------------|----------------|----------------|----------------|----------------|----------------|-------|
| Aemb         | 246 (3.35%)    | 312~(4.25%)    | 423 (5.77%)    | 489~(6.67%)    | 804 (10.96%)   | 7336  |
| DES3         | 14650 (22.13%) | 14650 (22.13%) | 14650 (22.13%) | 14650 (22.13%) | 14650 (22.13%) | 66194 |
| DLX          | 95~(0.76%)     | 631~(5.04%)    | 1187 (9.47%)   | 1409 (11.24%)  | 1574 (12.56%)  | 12531 |
| Huffman      | 125 (3.45%)    | 142 (3.92%)    | 335~(9.24%)    | 447 (12.33%)   | 597~(16.47%)   | 3624  |
| RISC         | 145~(0.82%)    | 216 (1.21%)    | 268~(1.51%)    | 570 (3.21%)    | 1717~(9.66%)   | 17778 |
| Reed-Solomon | 893 (22.78%)   | 1209 (30.84%)  | 1218 (31.07%)  | 1218 (31.07%)  | 1218~(31.07%)  | 3920  |
| VGA-LCD      | 13198 (42.86%) | 13198 (42.86%) | 14390 (46.73%) | 14390 (46.73%) | 16219 (52.67%) | 30792 |

Table 4.32: Numbers of cells in most critical paths post-P&R WC UMC

The results after PKS-R flow and after the worst-case corner placement and routing of SYN-P&R flow show no notable difference either.



Figure 4.23: Numbers of cells in most critical paths PKS-R IHP


Figure 4.24: Numbers of cells in most critical paths PKS-R UMC

| Design       | 5%             | 10%            | 15%            | 20%            | 30%                | Total |
|--------------|----------------|----------------|----------------|----------------|--------------------|-------|
| Aemb         | 314~(2.22%)    | 386~(2.72%)    | 434~(3.06%)    | 470 (3.32%)    | 719~(5.07%)        | 14168 |
| DES3         | 14030 (19.18%) | 14094 (19.27%) | 14094 (19.27%) | 14094 (19.27%) | 14094 (19.27%)     | 73153 |
| DLX          | 244 (1.47%)    | 761 (4.57%)    | 983~(5.90%)    | 1059~(6.36%)   | $2721 \ (16.35\%)$ | 16647 |
| Huffman      | 213~(6.09%)    | 258~(7.38%)    | 307~(8.78%)    | 533 (15.24%)   | 632~(18.07%)       | 3498  |
| RISC         | 248 (1.11%)    | 427 (1.91%)    | 805~(3.61%)    | 1254~(5.62%)   | 1583 (7.09%)       | 22327 |
| Reed-Solomon | 409 (9.64%)    | 1119 (26.37%)  | 1163 (27.41%)  | 1163 (27.41%)  | 1163 (27.41%)      | 4243  |
| VGA-LCD      | 22584~(60.97%) | 22584~(60.97%) | 23453~(63.31%) | 23453~(63.31%) | 23489 (63.41%)     | 37043 |

Table 4.33: Numbers of cells in most critical paths PKS-R IHP

| Design       | 5%             | 10%               | 15%            | 20%            | 30%            | Total |
|--------------|----------------|-------------------|----------------|----------------|----------------|-------|
| Aemb         | 318 (3.31%)    | 366~(3.81%)       | 385~(4.01%)    | 396~(4.12%)    | 695 (7.24%)    | 9603  |
| DES3         | 13174 (22.14%) | 13351 (22.44%)    | 13351 (22.44%) | 13351 (22.44%) | 13351 (22.44%) | 59497 |
| DLX          | 1163 (9.29%)   | $1420\ (11.34\%)$ | 1591 (12.70%)  | 1636~(13.06%)  | 2468 (19.71%)  | 12524 |
| Huffman      | 437 (12.94%)   | 464 (13.74%)      | 495 (14.66%)   | 553~(16.38%)   | 635~(18.80%)   | 3377  |
| RISC         | 236 (1.06%)    | 351~(1.58%)       | 733~(3.30%)    | 1451~(6.53%)   | 1799 (8.10%)   | 22205 |
| Reed-Solomon | 836 (23.82%)   | 1076 (30.66%)     | 1156 (32.93%)  | 1156 (32.93%)  | 1156 (32.93%)  | 3510  |
| VGA-LCD      | 19329 (55.91%) | 19329 (55.91%)    | 19329 (55.91%) | 22438 (64.90%) | 23947 (69.27%) | 34571 |

Table 4.34: Numbers of cells in most critical paths PKS-R UMC

# 4.4 Power Analysis Results

In this section, the results of power consumption for the two flows and for the two libraries are presented. In the case of SYN-P&R flow both the typical cor-

ner and the worst-case corner results are included. In the following figures, the annotations "SYN-P&R WC\_int", SYN-P&R TYP\_int, "PKS-R\_int", "SYN-P&R WC\_net", "SYN-P&R TYP\_net", "PKS-R\_net" correspond to the worst-case corner internal power of SYN-P&R flow, to the typical corner internal power of SYN-P&R flow, to the typical corner switching power of SYN-P&R flow and to the switching power of PKS-R flow accordingly. Internal power is the power consumed by the standard cells when they switch state. Switching power is the power consumed by the nets when they switch logical values.

Figure 4.25 shows the power-speed curve for Aemb in UMC-0.13 and Figure 4.26 shows the power-speed curve for Aemb in IHP-0.25. Tables 4.35 and 4.36 show the data for Aemb that has been plotted in Figures 4.25 and 4.26 respectively. The typical corner power consumption of SYN-P&R flow, is higher than the worst-case corner internal power consumption for Aemb. This behaviour is typical in most of the designs mainly due to the fact that the typical corner designs operate at higher frequencies than the worst-case corner ones.



Figure 4.25: Power-speed results for Aemb in UMC-0.13



Figure 4.26: Power-speed results for Aemb in IHP-0.25

| Design            | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-------------------|---------------|----------------------|------------------------|
| $Aemb(TYP_{100})$ | 2.8           | 26.8 (100%)          | 7.2 (100%)             |
| $Aemb(TYP_75)$    | 3.7           | 19.9 (74.2537%)      | 5.5(76.3889%)          |
| $Aemb(TYP_{50})$  | 5.6           | 12.5~(46.6418%)      | 3.6~(50%)              |
| $Aemb(TYP_25)$    | 11.2          | 6 (22.3881%)         | 1.8(25%)               |
| $Aemb(WC_100)$    | 4.6           | 11.2 (41.791%)       | 3.5~(48.6111%)         |
| $Aemb(WC_{75})$   | 6.1           | 8 (29.8507%)         | 2.6 (36.1111%)         |
| $Aemb(WC_50)$     | 9.2           | 3.1~(11.5672%)       | 1.4(19.4444%)          |
| $Aemb(WC_{25})$   | 18.4          | 2.4 (8.95522%)       | 0.8 (11.1111%)         |
| Aemb(PKS-R)       | 3.8           | $13.1 \ (48.8806\%)$ | 4 (55.5556%)           |

Table 4.35: Power-speed results for Aemb UMC-0.13

A comparison across the two flows shows that the power consumption results of PKS-R flow follows the curve of the power consumption of the worst-case corner of SYN-P&R flow, if the SYN-P&R flow curve is expanded to the reference frequency of PKS-R flow. This is explained by the fact that in the case of PKS-R flow, the

| Design          | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-----------------|---------------|----------------------|------------------------|
| $Aemb(TYP_100)$ | 7.2           | 89.8 (100%)          | 29.6 (100%)            |
| $Aemb(TYP_75)$  | 9.5           | 64.2 (71.4922%)      | 20.9 (70.6081%)        |
| $Aemb(TYP_50)$  | 14.4          | 46.8 (52.1158%)      | 15.6 (52.7027%)        |
| $Aemb(TYP_25)$  | 28.8          | 23.7~(26.392%)       | 8 (27.027%)            |
| $Aemb(WC_100)$  | 13            | 54.2 (60.3563%)      | 19.2~(64.8649%)        |
| $Aemb(WC_75)$   | 17.3          | 40.8 (45.4343%)      | 15.4 (52.027%)         |
| $Aemb(WC_50)$   | 26            | 26.6 (29.6214%)      | 7.1~(23.9865%)         |
| $Aemb(WC_25)$   | 52            | 12.2 (13.5857%)      | 4 (13.5135%)           |
| Aemb(PKS-R)     | 11.7          | 57.8 (64.3653%)      | 20.3~(68.5811%)        |

Table 4.36: Power-speed results for Aemb IHP-0.25

same worst-case corner libraries characteristics are used as in the worst-case corner of SYN-P&R flow.

Figure 4.27 shows the power-speed tradeoff for DES3 in UMC-0.13 and Figure 4.28 shows the power-speed tradeoff for DES3 in IHP-0.25. Tables 4.37 and 4.38 show the data for DES3 that has been plotted in Figures 4.27 and 4.28 respectively. The results of DES3 are similar to the results of Aemb in worst-case corner and typical corner comparisons. However, the PKS-R flow result does not seem to follow the curve of the worst corner of SYN-P&R flow.

| Design         | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|----------------|---------------|----------------------|------------------------|
| DES3(TYP_100)  | 1.7           | 79.2 (100%)          | 44.6 (100%)            |
| $DES3(TYP_75)$ | 2.3           | 67.7 (85.4798%)      | 37.5 (84.0807%)        |
| $DES3(TYP_50)$ | 3.4           | 51.4 (64.899%)       | 28.5~(63.9013%)        |
| $DES3(TYP_25)$ | 6.8           | 25~(31.5657%)        | 14 (31.3901%)          |
| DES3(WC_100)   | 2.3           | 54.5 (68.8131%)      | 27.6~(61.8834%)        |
| $DES3(WC_75)$  | 3.1           | 31.5 (39.7727%)      | 21.9~(49.1031%)        |
| $DES3(WC_50)$  | 4.6           | 21.2 (26.7677%)      | 15.1 (33.8565%)        |
| $DES3(WC_25)$  | 9.2           | 11.3 (14.2677%)      | 7.9 (17.713%)          |
| DES3(PKS-R)    | 2.2           | 52.9~(66.7929%)      | 32.4~(72.6457%)        |

Table 4.37: Power-speed results for DES3 UMC-0.13

Figure 4.29 shows the power-speed tradeoff for DLX in UMC-0.13 and Figure 4.30 shows the power-speed tradeoff for DLX in IHP-0.25 and the data are shown



Figure 4.27: Power-speed results for DES in UMC-0.13



Figure 4.28: Power-speed results for DES in IHP-0.25

| Design           | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|------------------|---------------|----------------------|------------------------|
| DES3(TYP_100)    | 4             | 934.4 (100%)         | 448.5 (100%)           |
| $DES3(TYP_75)$   | 5.3           | 614.5~(65.7641%)     | 262~(58.4169%)         |
| $DES3(TYP_50)$   | 8             | 380.2~(40.6892%)     | 163.4 (36.4326%)       |
| $DES3(TYP_{25})$ | 16            | 180.5~(19.3172%)     | 76.2~(16.99%)          |
| $DES3(WC_100)$   | 7.1           | 555.5~(59.4499%)     | 236.9~(52.8205%)       |
| $DES3(WC_75)$    | 9.5           | 327.5~(35.0492%)     | 135.6 (30.2341%)       |
| $DES3(WC_50)$    | 14.2          | 205 (21.9392%)       | 85.8 (19.1304%)        |
| DES3(WC_25)      | 28.4          | 96.1~(10.2847%)      | $39.2 \ (8.74025\%)$   |
| DES3(PKS-R)      | 7.2           | 443.1 (47.4208%)     | 184 (41.0256%)         |

Table 4.38: Power-speed results for DES3 IHP-0.25

in Tables 4.39 and 4.40. DLX's results are similar to those of Aemb's. The curves of Aemb and DLX seem identical for both libraries and operating conditions.



Figure 4.29: Power-speed results for DLX in UMC-0.13.

Figure 4.31 shows the power-speed tradeoff for Huffman in UMC-0.13 and Figure 4.32 shows the power-speed tradeoff for Huffman in IHP-0.25 and the actual data



Figure 4.30: Power-speed results for DLX in IHP-0.25.

| Design           | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|------------------|---------------|----------------------|------------------------|
| DLX(TYP_100)     | 2.5           | 25 (100%)            | 8.8 (100%)             |
| $DLX(TYP_75)$    | 3.3           | 18 (72%)             | 6 (68.1818%)           |
| $DLX(TYP_50)$    | 5             | 11.3 (45.2%)         | 4.3 (48.8636%)         |
| $DLX(TYP_25)$    | 10            | 5.7(22.8%)           | 2.2 (25%)              |
| $DLX(WC_{-100})$ | 3.6           | 13.5~(54%)           | 5.8~(65.9091%)         |
| $DLX(WC_{75})$   | 4.8           | 8.9 (35.6%)          | 3.4 (38.6364%)         |
| $DLX(WC_{50})$   | 7.2           | 5.9(23.6%)           | 2.5~(28.4091%)         |
| $DLX(WC_25)$     | 14.4          | 2.6 (10.4%)          | $1.1 \ (12.5\%)$       |
| DLX(PKS-R)       | 3.2           | 14.5~(58%)           | 6.2(70.4545%)          |

Table 4.39: Power-speed results for DLX UMC-0.13

| Design          | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-----------------|---------------|----------------------|------------------------|
| DLX(TYP_100)    | 5.3           | 144.6 (100%)         | 87.7 (100%)            |
| DLX(TYP_75)     | 7.1           | 101.1~(69.917%)      | 58.5~(66.7047%)        |
| $DLX(TYP_50)$   | 10.6          | 68.3 (47.2337%)      | $36.1 \ (41.1631\%)$   |
| $DLX(TYP_25)$   | 21.2          | 33.8~(23.3748%)      | 17.5~(19.9544%)        |
| $DLX(WC_{100})$ | 11.8          | $66.4 \ (45.9198\%)$ | 36.7 (41.8472%)        |
| $DLX(WC_75)$    | 15.7          | 46 (31.8119%)        | 25.4 (28.9624%)        |
| $DLX(WC_{50})$  | 23.6          | $31.5\ (21.7842\%)$  | 16.8 (19.1562%)        |
| $DLX(WC_{25})$  | 47.2          | $15.4 \ (10.6501\%)$ | 8.4 (9.57811%)         |
| DLX(PKS-R)      | 9.2           | 93.2~(64.4537%)      | 51.8 (59.065%)         |

Table 4.40: Power-speed results for DLX IHP-0.25

are presented in Tables 4.41 and 4.42 For UMC-0.13, the results of Huffman are similar to the results of DLX. However, for IHP-0.25, the worst-case corner power consumption is higher that the typical corner power consumption both in terms of internal power and of switching power.



Figure 4.31: Power-speed results for Huffman in UMC-0.13.

| Design              | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|---------------------|---------------|----------------------|------------------------|
| Huffman(TYP_100)    | 1.7           | 1.4 (100%)           | 0.5~(100%)             |
| Huffman(TYP_75)     | 2.3           | 1 (71.4286%)         | 0.4 (80%)              |
| Huffman(TYP_ $50$ ) | 3.4           | 0.7~(50%)            | 0.3~(60%)              |
| Huffman(TYP_25)     | 6.8           | 0.4~(28.5714%)       | 0.2~(40%)              |
| Huffman(WC_100)     | 2.3           | 0.7~(50%)            | 0.3~(60%)              |
| Huffman(WC_75)      | 3.1           | 0.5 (35.7143%)       | 0.2 (40%)              |
| $Huffman(WC_{50})$  | 4.6           | 0.4~(28.5714%)       | 0.2~(40%)              |
| Huffman(WC_ $25$ )  | 9.2           | 0.2 (14.2857%)       | 0.1 (20%)              |
| Huffman(PKS-R)      | 2             | 0.9~(64.2857%)       | 0.4 (80%)              |

Table 4.41: Power-speed results for Huffman UMC-0.13



Figure 4.32: Power-speed results for Huffman in IHP-0.25.

| Design            | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-------------------|---------------|----------------------|------------------------|
| Huffman(TYP_100)  | 4.7           | 14.4 (100%)          | 8.8 (100%)             |
| Huffman(TYP_75)   | 6.2           | 8.7 (60.4167%)       | 4.4 (50%)              |
| $Huffman(TYP_50)$ | 9.4           | 5.8(40.2778%)        | 3.2 (36.3636%)         |
| $Huffman(TYP_25)$ | 18.8          | 3.4(23.6111%)        | 1.9(21.5909%)          |
| Huffman(WC_100)   | 7.3           | $11.9\ (82.6389\%)$  | 7.7 (87.5%)            |
| $Huffman(WC_75)$  | 9.7           | 8.4 (58.3333%)       | 5.2 (59.0909%)         |
| $Huffman(WC_50)$  | 14.6          | 5.6(38.8889%)        | 3.7~(42.0455%)         |
| Huffman(WC_25)    | 29.2          | 2.7 (18.75%)         | 1.8(20.4545%)          |
| Huffman(PKS-R)    | 6.7           | 12 (83.3333%)        | 7.6~(86.3636%)         |

Table 4.42: Power-speed results for Huffman IHP-0.25

Figure 4.33 shows the power-speed tradeoff for RISC in UMC-0.13 and Figure 4.34 shows the power-speed tradeoff for RISC in IHP-0.25. Tables 4.43 and 4.44 show the data for RISC that has been plotted in Figures 4.33 and 4.34 respectively. The results of RISC, seem to have the same properties as those of the other two pipelined processors, Aemb and DLX.



Figure 4.33: Power-speed results for RISC in UMC-0.13.

Figure 4.35 shows the power-speed tradeoff for Reed-Solomon in UMC-0.13 and Figure 4.36 shows the power-speed tradeoff for Reed-Solomon in IHP-0.25. Tables 4.45 and 4.46 show the data for Reed-Solomon that has been plotted in Figures 4.35 and 4.36 respectively. For Reed-Solomon, PKS-R flow produces a design which is not only faster than the design produced by SYN-P&R flow, but also consumes less power. This is more evident in the IHP-0.25 implementation.



Figure 4.34: Power-speed results for RISC in IHP-0.25.

| Design                                          | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-------------------------------------------------|---------------|----------------------|------------------------|
| $\operatorname{Risc}(\operatorname{TYP}_{100})$ | 2.8           | 7.8 (100%)           | 3.7~(100%)             |
| $\operatorname{Risc}(\operatorname{TYP}_{75})$  | 3.7           | 5.9(75.641%)         | 2.9(78.3784%)          |
| $\operatorname{Risc}(\operatorname{TYP}_{50})$  | 5.6           | 4.1 (52.5641%)       | 2 (54.0541%)           |
| $\operatorname{Risc}(\operatorname{TYP}_{-25})$ | 11.2          | 2.2 (28.2051%)       | 1.2 (32.4324%)         |
| $\operatorname{Risc}(WC_{-100})$                | 4.3           | 5.2~(66.6667%)       | 2.8~(75.6757%)         |
| $\operatorname{Risc}(WC_75)$                    | 5.7           | 3.5~(44.8718%)       | 2 (54.0541%)           |
| $\operatorname{Risc}(WC_50)$                    | 8.6           | 2.8 (35.8974%)       | 1.6~(43.2432%)         |
| $\operatorname{Risc}(WC_25)$                    | 17.2          | 1.2 (15.3846%)       | 0.5~(13.5135%)         |
| $\operatorname{Risc}(\operatorname{PKS-R})$     | 3.9           | 3.6~(46.1538%)       | 1.7 (45.9459%)         |

Table 4.43: Power-speed results for RISC UMC-0.13

| Design                                           | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|--------------------------------------------------|---------------|----------------------|------------------------|
| $\operatorname{Risc}(\operatorname{TYP}_{-100})$ | 6.8           | 62.3 (100%)          | 35.7 (100%)            |
| $\operatorname{Risc}(\operatorname{TYP}_{-75})$  | 9             | 45.2 (72.5522%)      | 26.6 (74.5098%)        |
| $\operatorname{Risc}(\operatorname{TYP}_{50})$   | 13.6          | 29.6 (47.512%)       | 16.5~(46.2185%)        |
| $\operatorname{Risc}(\operatorname{TYP}_25)$     | 27.2          | 16.8 (26.9663%)      | 9.3~(26.0504%)         |
| $\operatorname{Risc}(WC_{100})$                  | 13.1          | 37.2 (59.7111%)      | 21.1 (59.1036%)        |
| $\operatorname{Risc}(WC_75)$                     | 17.5          | 29.8 (47.8331%)      | $15.4 \ (43.1373\%)$   |
| $\operatorname{Risc}(WC_{50})$                   | 26.2          | 15.2 (24.3981%)      | 8.1 (22.6891%)         |
| $\operatorname{Risc}(WC_{25})$                   | 52.4          | 6.8 (10.9149%)       | 3.4 (9.52381%)         |
| $\operatorname{Risc}(\operatorname{PKS-R})$      | 11.6          | 41.7~(66.9342%)      | 23.5~(65.8263%)        |

Table 4.44: Power-speed results for RISC IHP-0.25



Figure 4.35: Power-speed results for Reed-Solomon in UMC-0.13.



Figure 4.36: Power-speed results for Reed-Solomon in IHP-0.25.

| Design                                                | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-------------------------------------------------------|---------------|----------------------|------------------------|
| Reed-Solomon(TYP_100)                                 | 1.4           | 30 (100%)            | 14.7 (100%)            |
| Reed-Solomon(TYP_75)                                  | 1.9           | 20.4 (68%)           | $10.9\ (74.1497\%)$    |
| Reed-Solomon(TYP_50)                                  | 2.8           | 17.8 (59.3333%)      | 10.3~(70.068%)         |
| Reed-Solomon(TYP_25)                                  | 5.6           | 8.2 (27.3333%)       | 5 (34.0136%)           |
| $\operatorname{Reed-Solomon}(\operatorname{WC\_100})$ | 2.3           | 14.1 (47%)           | 7.4~(50.3401%)         |
| Reed-Solomon(WC_75)                                   | 3             | 9.3~(31%)            | 5 (34.0136%)           |
| Reed-Solomon(WC_50)                                   | 4.6           | 7.2 (24%)            | 4.6 (31.2925%)         |
| Reed-Solomon(WC_25)                                   | 9.2           | 3.6~(12%)            | 2.4~(16.3265%)         |
| Reed-Solomon(PKS-R)                                   | 2.1           | $14.9 \ (49.6667\%)$ | 6.8(46.2585%)          |

Table 4.45: Power-speed results for Reed-Solomon UMC-0.13

| Design                   | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|--------------------------|---------------|----------------------|------------------------|
| Reed-Solomon(TYP_100)    | 4.1           | 171.6 (100%)         | 114.7 (100%)           |
| Reed-Solomon(TYP_75)     | 5.4           | 152.4 (88.8112%)     | $101.9 \ (88.8405\%)$  |
| Reed-Solomon(TYP_50)     | 8.2           | 82.3 (47.9604%)      | 64.2~(55.9721%)        |
| Reed-Solomon(TYP_ $25$ ) | 16.4          | 52.7 (30.711%)       | 39.9(34.7864%)         |
| $Reed-Solomon(WC_100)$   | 8.4           | 87.9 (51.2238%)      | 54.7 (47.6896%)        |
| Reed-Solomon(WC_75)      | 11.2          | 67 (39.0443%)        | 42.7 (37.2276%)        |
| Reed-Solomon(WC_ $50$ )  | 16.8          | 41.5 (24.1841%)      | 29.4~(25.6321%)        |
| Reed-Solomon(WC_ $25$ )  | 33.6          | 22.6 (13.1702%)      | $17.2 \ (14.9956\%)$   |
| Reed-Solomon(PKS-R)      | 6.2           | 78.5 (45.7459%)      | 52.3~(45.5972%)        |

Table 4.46: Power-speed results for Reed-Solomon IHP-0.25

Figure 4.37 shows the power-speed comparison for VGA-LCD in UMC-0.13 and Figure 4.38 shows the power-speed comparison for VGA-LCD in IHP-0.25. the actual values of the data are shown in Tables 4.47 and 4.48 For VGA-LCD, in UMC-0.13, the internal power consumption in the worst corner of SYN-P&R flow is larger than the internal power consumption in the typical corner of SYN-P&R flow, which is not common for the other designs. In IHP-0.25, both the internal and the switching power consumption seem to be the same for both operating conditions for the same clock frequency.



Figure 4.37: Power-speed results for VGA-LCD in UMC-0.13.

As a general observation, the typical corner power consumption is for most designs and implementations higher than the worst corner one. This can be explained by the fact that the typical corner implementations operate at higher clock frequencies than the worst corner ones. Another observation is that it is possible that



Figure 4.38: Power-speed results for VGA-LCD in IHP-0.25.

| Design             | Frequency(ns) | Internal power $(mW)$ | Net switch $power(mW)$ |
|--------------------|---------------|-----------------------|------------------------|
| $Vga-Lcd(TYP_100)$ | 2.3           | 45.4 (100%)           | 34.3 (100%)            |
| Vga-Lcd(TYP_75)    | 3.1           | 34.2~(75.3304%)       | $28.1 \ (81.9242\%)$   |
| Vga-Lcd(TYP_50)    | 4.6           | 21.4 (47.1366%)       | 19.4~(56.5598%)        |
| Vga-Lcd(TYP_25)    | 9.2           | $12.2 \ (26.8722\%)$  | 6.9(20.1166%)          |
| $Vga-Lcd(WC_100)$  | 3.7           | $38.4 \ (84.5815\%)$  | 24.3 (70.8455%)        |
| $Vga-Lcd(WC_75)$   | 4.9           | 29.3~(64.5374%)       | 13.1 (38.1924%)        |
| $Vga-Lcd(WC_50)$   | 7.4           | 17.7~(38.9868%)       | $8.1 \ (23.6152\%)$    |
| $Vga-Lcd(WC_25)$   | 14.8          | 6.3(13.8767%)         | 4.7 (13.7026%)         |
| Vga-Lcd(PKS-R)     | 3.5           | 30.2~(66.5198%)       | 20.4~(59.4752%)        |

Table 4.47: Power-speed results for VGA-LCD UMC-0.13

| Design            | Frequency(ns) | Internal $power(mW)$ | Net switch $power(mW)$ |
|-------------------|---------------|----------------------|------------------------|
| Vga-Lcd(TYP_100)  | 6.3           | 735.3 (100%)         | 377.9 (100%)           |
| Vga-Lcd(TYP_75)   | 8.4           | 512.4~(69.6858%)     | 112.9~(29.8756%)       |
| $Vga-Lcd(TYP_50)$ | 12.6          | 219.8~(29.8926%)     | $101.2 \ (26.7796\%)$  |
| Vga-Lcd(TYP_25)   | 25.2          | $134.6\ (18.3055\%)$ | 48.3 (12.7812%)        |
| $Vga-Lcd(WC_100)$ | 12.8          | 318.5~(43.3157%)     | $107.1 \ (28.3408\%)$  |
| Vga-Lcd(WC_75)    | 17.1          | 201.2~(27.363%)      | 82.4 (21.8047%)        |
| $Vga-Lcd(WC_50)$  | 25.6          | $112.9\ (15.3543\%)$ | 43.4 (11.4845%)        |
| $Vga-Lcd(WC_25)$  | 51.2          | 41.8 (5.68475%)      | 18.2 (4.81609%)        |
| Vga-Lcd(PKS-R)    | 12.2          | 251.7 (34.2309%)     | 94.2~(24.9272%)        |

Table 4.48: Power-speed results for VGA-LCD IHP-0.25

the PKS-R flow power consumption falls onto the expanded SYN-P&R flow worst corner curve. A final important fact is that for IHP-0.25, the difference in power consumption between the typical corner and the worst corner is smaller than for UMC-0.13.

## 4.5 Maximum Frequency Comparison

Table 4.49 shows the maximum frequency achieved in IHP-0.25 across the different flows and conditions. The columns "Post-Syn TYP" and "Post-P&R TYP" correspond to the maximum frequency achieved following SYN-P&R flow in the typical corner after synthesis and after placement and routing. Accordingly, the columns "Post-Syn WC" and "Post-P&R WC" correspond to the maximum frequency achieved following SYN-P&R flow in the worst-case corner after synthesis and after placement and routing. The column "PKS-R" corresponds to the maximum frequency results with PKS-R flow.

# 4.5.1 SYN-P&R flow Post-synthesis versus SYN-P&R flow Post-P&R

### IHP-0.25

As shown in Table 4.49, there is a decrease in the maximum frequency of the design after placement and routing, as compared with the frequency post-synthesis for SYN-P&R flow. In the case of Aemb, the delay is increased from 5.6ns to 7.2ns in typical corner and from 11.0ns to 13.0ns in worst-case corner. This translates to 28% decrease in typical corner and 18% decrease in worst-case corner. In the case of DES3, there is a decrease of about 32% both in typical corner (from 3.0ns to 4.0ns) and in worst-case corner (from 5.4ns to 7.1ns). In the case of DLX, the decrease in speed is smaller. In the typical corner, the design operates at 4.7ns after synthesis and at 5.3ns after placement and routing (13% decrease). In the worst-case corner, the design operates at 10.2ns after synthesis and at 11.8ns after placement and routing (16% decrease). For Huffman, there is a large decrease of 42% in typical corner, while there is a much smaller decrease of 12% in the worst-case corner. RISC operates at 5.8ns after synthesis and at 6.8ns after place and route in the typical corner, thus there is a decrease of 17%. In the worst-case corner, this decrease is 19% (from 11.0ns to 13.1ns). In the case of Reed-Solomon, there is a large increase in the cycle time in the typical corner (37%), while this increase is 26% in the worst-case corner. For VGA-LCD, the decrease of speed is more than 50% in both operating conditions.

### UMC-0.13

Table 4.50 shows the maximum frequency achieved in UMC-0.13 across the different flows and conditions. In the case of Aemb, the circuit operates 17% slower when moving from synthesis to placement and routing in the typical corner. In the worstcase corner, it works 9% slower. In the case of DES3, after placement and routing, there is a decrease in speed of 31% and 15% in the typical corner and in the worstcase corner respectively. In DLX, there is a large decrease in speed after placement and routing in the typical corner (39%) and a smaller decrease in the worst-case corner (13%). The same behaviour as in DLX is observed in the case of Huffman. There is a large decrease of 42% in the typical corner and there is a decrease of 15% in the worst-case corner. In the case of RISC, the decrease while moving from synthesis to placement and routing is 17% in the typical case and 13% in the worstcase corner. In the case of Reed-Solomon there is a similar decrease between typical corner and worst-case corner and it is about 9% in both cases. For VGA-LCD, the placed and routed design is about 30% slower in both operating conditions.

## 4.5.2 SYN-P&R flow Post-P&R WC versus PKS-R flow

### **IHP-0.25**

A comparison of PKS-R flow with the worst-case corner of SYN-P&R flow, based on Table 4.49 can show that the maximum frequency achieved with PKS-R flow is higher than the frequency achieved with SYN-P&R flow in most cases. In Aemb PKS-R flow is 11% faster, in DLX PKS-R flow is 28% faster, in Huffman PKS-R flow is 9% faster, in RISC PKS-R flow is 13% faster and in Reed-Solomon PKS-R flow is 35% faster. Only in the case of DES3 is SYN-P&R flow faster by 1.5%.

### UMC-0.13

As Table 4.50 shows, the maximum frequency achieved with PKS-R flow is higher than the frequency achieved with SYN-P&R flow in all of the cases. In Aemb, PKS-R flow is faster by 21%, in DES3 it is faster by 5%, in DLX it is faster by 13%, in Huffman it is faster by 15%, in RISC it is faster by 10%, in Reed-Solomon it is faster by 10% and in VGA-LCD it is faster by 5%.

| Design       | Post-Syn TYP | Post-P&R TYP | Post-Syn WC | Post-P&R WC       | PKS-R             |
|--------------|--------------|--------------|-------------|-------------------|-------------------|
| Aemb         | 5.6ns        | 7.2ns        | 11.0ns      | 13.0ns            | 11.7 ns           |
| DES3         | 3.0ns        | 4.0ns        | 5.4ns       | $7.1 \mathrm{ns}$ | $7.2 \mathrm{ns}$ |
| DLX          | 4.7ns        | 5.3 ns       | 10.2ns      | 11.8ns            | 9.2ns             |
| Huffman      | 3.3ns        | 4.7ns        | 6.5ns       | 7.3ns             | $6.7 \mathrm{ns}$ |
| Risc         | 5.8ns        | 6.8ns        | 11.0ns      | 13.1ns            | 11.6ns            |
| Reed-Solomon | 3.0ns        | 4.1ns        | 6.7ns       | 8.4ns             | 6.2 ns            |
| Vga-Lcd      | 3.6ns        | 6.3ns        | 8.0ns       | 12.8ns            | 12.2ns            |

Table 4.49: Post-synthesis and post-P&R max frequency for IHP-0.25

| Design       | Post-Syn TYP | Post-P&R TYP      | Post-Syn WC | Post-P&R WC | PKS-R |
|--------------|--------------|-------------------|-------------|-------------|-------|
| Aemb         | 2.4ns        | 2.8ns             | 4.2ns       | 4.6ns       | 3.8ns |
| DES3         | 1.3ns        | 1.7ns             | 2.0ns       | 2.3ns       | 2.2ns |
| DLX          | 1.8ns        | 2.5ns             | 3.2ns       | 3.6ns       | 3.2ns |
| Huffman      | 1.2ns        | $1.7 \mathrm{ns}$ | 2.0ns       | 2.3ns       | 2.0ns |
| Risc         | 2.4 ns       | 2.8ns             | 3.8ns       | 4.3ns       | 3.9ns |
| Reed-Solomon | 1.3ns        | 1.4ns             | 2.1ns       | 2.3ns       | 2.1ns |
| Vga-Lcd      | 1.8ns        | 2.3ns             | 2.8ns       | 3.7ns       | 3.5ns |

Table 4.50: Post-synthesis and post-P&R max frequency for UMC-0.13

## 4.5.3 Typical Corner Versus Worst Corner

### **IHP-0.25**

As shown in Table 4.49, most of the designs in worst-case corner, operate at half the speed of the typical case. Aemb has a clock frequency of 5.6ns after synthesis in typical corner, and a clock frequency of 11ns after synthesis in the worst-case corner. After placement and routing, the clock cycle is 13.0ns in the worst-case corner and 45% faster in typical corner. In DES3, after synthesis the clock frequency is 45% greater in typical corner than in worst-case corner. After placement and routing the difference is 44%. In the case of DLX, the difference after synthesis is 54% and after placement and routing it is 55%. In Huffman, the decrease in speed while moving from the typical corner to the worst-case corner is 49% after synthesis and 36% after placement and routing. In the case of RISC, the difference in speed is 47% after synthesis and 48% after placement and routing. In the case of Reed-Solomon, the worst-case corner synthesized circuit runs 55% slower than the synthesized circuit in the typical corner and the placed and routed circuit in worst-case corner runs 56% slower that the placed and routed circuit in typical corner. In the case of VGA-LCD, the circuit implemented at the worst corner, is more than two times slower than the circuit implemented at the typical corner.

### UMC-0.13

As shown in Table 4.49, most of the designs in worst-case corner, operate much slower than the speed they operate in the typical case. Aemb is slowed by 43% after synthesis and by 40% after placement and routing when moving from the typical corner to the worst-case corner. In the case of DES3, there is a decrease of the maximum frequency of 35% after synthesis and of 26% after place and route. DLX operates 44% slower after synthesis and 31% slower after place and route. In the case of Huffman, the difference in operating speed is 40% and 26% after synthesis and place and route respectively. RISC experiences a 37% slowdown after synthesis and a 35% slowdown after placement and routing. In the case of Reed-Solomon, the circuit after synthesis is slowed by 38% in the worst-case corner and the placed and routed circuit is slowed by 39% in the worst-case corner. For VGA-LCD, there is a decrease of speed of about 55% when moving from the typical corner to the worst-case corner.

## 4.6 Critical Path Analysis

In this section the effect of the different flows and the different operating conditions on the critical paths is presented. The critical paths are examined in terms of their structure, *i.e* which is the start point, which is the end point and which gates are included in the critical path.

## 4.6.1 Synthesis versus Place and Route

### **Typical Corner**

| Design       | Post-Syn Start      | Post-P&R Start | Post-Syn End          | Post-P&R End          |
|--------------|---------------------|----------------|-----------------------|-----------------------|
| DLX          | counter_reg_0       | counter_reg_1  | branch_address_reg_30 | branch_address_reg_31 |
| Aemb         | rRADD_reg_0         | rRADD_reg_0    | rRES_reg_29           | rRES_reg_30           |
| DES3         | decrypt             | decrypt        | R10_reg_11            | R11_reg_31            |
| Huffman      | cnt_reg_3           | code_reg_14    | do_reg_4              | do_reg_2              |
| Reed-Solomon | out_reg_6           | out_reg_6      | out_reg_7             | out_reg_6             |
| RISC         | aluAmode2_reg_reg_1 | IRA_reg_18     | PC_reg_23             | PC_reg_31             |
| VGA-LCD      | ra_reg_1            | ra_reg_1       | dat_o_reg_3/D         | b_reg_1               |

Table 4.51 shows the startpoints and endpoints for UMC-0.13, post-synthesis and post-P&R.

Table 4.51: Startpoints and Endpoints for UMC-0.13 typical corner

Table 4.52 shows the startpoints and endpoints for IHP-0.25, post-synthesis and post-P&R.

| Design       | Post-Syn Start  | Post-P&R Start      | Post-Syn End      | Post-P&R End         |
|--------------|-----------------|---------------------|-------------------|----------------------|
| DLX          | reg_out_A_reg_9 | counter_reg_1       | ALU_result_reg_31 | branch_address_reg_0 |
| Aemb         | rRADD_reg_0     | rRADD_reg_1         | add_o_reg_31      | add_o_reg_29         |
| DES3         | decrypt         | decrypt             | R12_reg_11        | R3_reg_12            |
| Huffman      | cnt_reg_3       | code_reg_14         | code_reg_2        | do_reg_2             |
| Reed-Solomon | out_reg_5       | datain_6            | out_reg_2         | out_reg_1            |
| RISC         | IRA_reg_2       | aluBmode2_reg_reg_3 | PC_reg_28         | PC_reg_19            |
| VGA-LCD      | ra_reg_1        | ra_reg_1            | b_reg_1           | b_reg_2              |

Table 4.52: Startpoints and Endpoints for IHP-0.25 typical corner

As is shown in Tables 4.51 and 4.52, for the typical corner, the startpoints and the endpoints of the most critical path are similar post-synthesis and post-P&R. For almost all of the designs, even if the startpoint or the endpoint may not be exactly the post-synthesis and post-P&R, they belong to the same bus, which means that the structure of the critical path is similar in both measurements.

## Worst-Case Corner

Table 4.53 shows the startpoints and endpoints for UMC-0.13, post-synthesis and post-P&R.

| Design       | Post-Syn Start   | Post-P&R Start      | Post-Syn End              | Post-P&R End          |
|--------------|------------------|---------------------|---------------------------|-----------------------|
| DLX          | counter_reg_0    | counter_reg_1       | <pre>slot_num_reg_0</pre> | branch_address_reg_31 |
| Aemb         | reg_out_A_reg_11 | reg_out_A_reg_11    | add_o_reg_5               | add_o_reg_4           |
| DES3         | decrypt          | decrypt             | R8_reg_7                  | R8_reg_10             |
| Huffman      | cnt_reg_3        | code_reg_13         | code_reg_10               | do_reg_1              |
| Reed-Solomon | out_reg_7        | out_reg_2           | out_reg_1                 | out_reg_5             |
| RISC         | IRA_reg_20       | aluAmode2_reg_reg_0 | PC_reg_28                 | PC_reg_1              |
| VGA-LCD      | ra_reg_1         | ra_reg_0            | dat_o_reg_9               | dat_o_reg_11          |

Table 4.53: Startpoints and Endpoints for UMC-0.13 worst corner

Table 4.54 shows the startpoints and endpoints for IHP-0.25, post-synthesis and post-P&R.

| Design       | Post-Syn Start     | Post-P&R Start | Post-Syn End    | Post-P&R End         |
|--------------|--------------------|----------------|-----------------|----------------------|
| DLX          | counter_reg_0      | counter_reg_0  | reg_out_B_reg_2 | branch_address_reg_5 |
| Aemb         | rRADD_reg_0        | rRADD_reg_7    | add_o_reg_31    | add_o_reg_29         |
| DES3         | decrypt            | decrypt        | R12_reg_9       | R3_reg_12            |
| Huffman      | codelen_reg_4      | code_reg_9     | sreg_reg_12     | do_reg_5             |
| Reed-Solomon | out_reg_5          | out_reg_4      | b_3             | out_reg_7            |
| RISC         | aluAinB2_reg_reg_8 | IRA_reg_19     | PC_reg_6        | PC_reg_17            |
| VGA-LCD      | ra_reg_1           | ra_reg_1       | r_reg_0         | r_reg_1              |

Table 4.54: Startpoints and Endpoints for IHP-0.25 worst corner

For the worst-case corner, the same observation as in the typical corner can be made, that for most of the designs, the most critical path has the same structure post-synthesis and post-P&R. However, there are some exceptions in this case, which are the startpoints of Huffman and RISC for UMC-0.13 and IHP-0.25, the endpoints of Huffman and DLX for UMC-0.13 and the endpoints of DLX, Huffman and Reed-Solomon for IHP-0.25.

## 4.6.2 SYN-P&R flow WC versus PKS-R flow

Table 4.55 shows the startpoints and endpoints for IHP-0.25, post-synthesis and post-P&R.

Table 4.56 shows the startpoints and endpoints for UMC-0.13, post-synthesis and post-P&R.

A comparison between the two flows shows that about two thirds of the designs show no difference in the structure of the most critical path in the two implementa-

| Design       | SYN-P&R WC Start | PKS-R Start          | SYN-P&R WC End | PKS-R End            |
|--------------|------------------|----------------------|----------------|----------------------|
| DLX          | counter_reg_3    | branch_address_reg_4 | counter_reg_2  | branch_address_reg_7 |
| Aemb         | rRADD_reg_9      | rRADD_reg_1          | add_o_reg_28   | rRES_reg_31          |
| DES3         | decrypt          | decrypt              | R8_reg_21      | R4_reg_15            |
| Huffman      | code_reg_9       | code_reg_9           | do_reg_5       | do_reg_5             |
| Reed-Solomon | b_3              | out_reg_3            | out_reg_5      | out_reg_4            |
| RISC         | IRA_reg_19       | PC2_reg_5            | PC_reg_30      | PC_reg_17            |
| VGA-LCD      | ra_reg_1         | ra_reg_0             | r_reg_1        | r_reg_0              |

Table 4.55: Startpoints and Endpoints for IHP-0.25 for the two flows.

| Design       | SYN-P&R WC Start    | PKS-R Start   | SYN-P&R WC End        | PKS-R End             |
|--------------|---------------------|---------------|-----------------------|-----------------------|
| DLX          | counter_reg_1       | counter_reg_5 | branch_address_reg_31 | branch_address_reg_23 |
| Aemb         | rRADD_reg_1         | sel_o_reg_2   | rRES_reg_31           | rMEM_reg_0            |
| DES3         | decrypt             | decrypt       | R4_reg_9              | R13_reg_6             |
| Huffman      | code_reg_0          | code_reg_1    | do_reg_4              | do_reg_2              |
| Reed-Solomon | out_reg_4           | out_reg_4     | out_reg_7             | out_reg_4             |
| RISC         | aluAmode2_reg_reg_0 | IRA_reg_31    | PC_reg_5              | aluAinB2_reg_reg_14   |
| VGA-LCD      | ra_reg_0            | ra_reg_0      | dat_o_reg_9           | dat_o_reg_11          |

Table 4.56: Startpoints and Endpoints for UMC-0.13 for the two flows.

tions, while the other designs may have a different startpoint, a different endpoint or both. It seems that the different flows affect the structure of the most critical path more than the different libraries or the operating conditions.

# 4.7 Critical Path Topology

In this section, the topology of the most critical paths is presented. All of the results refer to placed and routed designs in both flows. In order to visualize the placement of the critical paths on the layout, screen dumps from the placement and routing tool have been taken. The same tool has been used in all of the experiments.

## 4.7.1 UMC-0.13 WC

All of the screen dumps refer to placed and routed designs with the UMC-0.13 worst-case corner process for both flows, SYN-P&R flow and PKS-R flow.

Figure 4.39 shows the topology of the 30% most critical paths for Aemb SYN-P&R flow and Figure 4.40 shows the topology of the 30% most critical paths for Aemb PKS-R flow. The total number of paths for this design is 4771, so in the figures below, the paths that are highlighted are 1431. It seems that with the PKS-R flow, the most critical paths are more clustered than with SYN-P&R flow for this design.



Figure 4.39: Topology of critical paths for Figure 4.40: Topology of critical paths for<br/>Aemb SYN-P&R flow.Aemb PKS-R flow.

Figure 4.41 shows the topology of the 30% most critical paths for DES3 SYN-P&R flow and Figure 4.42 shows the topology of the 30% most critical paths for DES3 PKS-R flow. The total number of paths for this design is 8808, so in the figures below, the paths that are highlighted are 2642. As is shown in both Figures 4.41 and 4.42 there is no clustering of the critical paths in this design. Both flows yield layouts where the 30% most critical paths include cells which are placed in virtually all of the layout area.

Figure 4.43 shows the topology of the 30% most critical paths for DLX SYN-P&R flow and Figure 4.44 shows the topology of the 30% most critical paths for DLX PKS-R flow. The total number of paths for this design is 2955, so in the figures below, the paths that are highlighted are 887. As in the case of DES3, there is no clustering of the most critical paths on the layout.

Figure 4.45 shows the topology of the 30% most critical paths for Huffman SYN-P&R flow and Figure 4.46 shows the topology of the 30% most critical paths for Huffman PKS-R flow. The total number of paths for this design is 180, so in the



Figure 4.41: Topology of critical paths for Figure 4.42: Topology of critical paths forDES3 SYN-P&R flow.DES3 PKS-R flow.



Figure 4.43: Topology of critical paths for Figure 4.44: Topology of critical paths forDLX SYN-P&R flow.DLX PKS-R flow.

figures below, the paths that are highlighted are 54. With both flows, the 30% most critical paths are all in the area occupied by the module dec1. More significantly, with PKS-R flow, the most critical paths are clustered in an area of about one quarter of the total layout area.



Figure 4.45: Topology of critical paths for Figure 4.46: Topology of critical paths for Huffman SYN-P&R flow. Huffman PKS-R flow.

Figure 4.47 shows the topology of the 30% most critical paths for Reed-Solomon SYN-P&R flow and Figure 4.48 shows the topology of the 30% most critical paths for Reed-Solomon PKS-R flow. The total number of paths for this design is 145, so in the figures below, the paths that are highlighted are 44. In this case, it seems that both flows do not result in a layout where the 30% most critical paths are clustered. Although the number of paths highlighted is small (44), they include cells which are practically everywhere in the layout.

Figure 4.49 shows the topology of the 30% most critical paths for RISC SYN-P&R flow and Figure 4.50 shows the topology of the 30% most critical paths for RISC PKS-R flow. The total number of paths for this design is 870, so in the figures below, the paths that are highlighted are 261. Although the SYN-P&R flow does not seem to cluster the most critical paths of RISC, with the PKS-R flow, as is shown in Figure 4.50, almost all of the 30% most critical paths are in the uppermost quarter of the layout.

Figure 4.51 shows the topology of the 30% most critical paths for VGA-LCD SYN-P&R flow and Figure 4.52 shows the topology of the 30% most critical paths



Figure 4.47: Topology of critical paths for Figure 4.48: Topology of critical paths forReed-Solomon SYN-P&R flow.Reed-Solomon PKS-R flow.



Figure 4.49: Topology of critical paths for Figure 4.50: Topology of critical paths forRISC SYN-P&R flow.RISC PKS-R flow.

for VGA-LCD PKS-R flow. The total number of paths for this design is 33640, so in the figures below, the paths that are highlighted are 10092. Like DES3, DLX and Reed-Solomon, the critical paths of VGA-LCD are not clustered with either flows.



Figure 4.51: Topology of critical paths for Figure 4.52: Topology of critical paths for VGA-LCD SYN-P&R flow. VGA-LCD PKS-R flow.

### 4.7.2 IHP-0.25 WC

All of the screen dumps refer to placed and routed designs with the IHP-0.25 worstcase corner process in the case of both flows, SYN-P&R flow and PKS-R flow. The number of paths highlighted is the same as in the previous section for all designs.

Figure 4.53 shows the topology of the 30% most critical paths for Aemb SYN-P&R flow and Figure 4.54 shows the topology of the 30% most critical paths for Aemb PKS-R flow. Like the UMC-0.13 results, the PKS-R flow clusters the most critical paths slightly more than the SYN-P&R flow for Aemb.

Figure 4.55 shows the topology of the 30% most critical paths for DES3 SYN-P&R flow and Figure 4.56 shows the topology of the 30% most critical paths for DES3 PKS-R flow. As shown in both Figures, 4.55 and 4.56, there is no clustering of the 30% most critical paths in either flow for this design. The 2642 most critical paths cover all the area of the layout.

Figure 4.57 shows the topology of the 30% most critical paths for DLX SYN-P&R flow and Figure 4.58 shows the topology of the 30% most critical paths for



Figure 4.53: Topology of critical paths for<br/>Aemb SYN-P&R flow.Figure 4.54: Topology of critical paths for<br/>Aemb PKS-R flow.



Figure 4.55: Topology of critical paths forFigure 4.56: Topology of critical paths forDES3 SYN-P&R flow.DES3 PKS-R flow.

DLX PKS-R flow. Figure 4.57 shows that there is no significant clustering of the 30% most critical paths for this design in SYN-P&R flow. After PKS-R flow, as it is shown in Figure 4.58, there are some areas of the layout where the condensation of the paths is quite tight, and some other parts of the layout where there appear no critical paths.





Figure 4.57: Topology of critical paths for DLX SYN-P&R flow.

Figure 4.58: Topology of critical paths for DLX PKS-R flow.

Figure 4.59 shows the topology of the 30% most critical paths for Huffman SYN-P&R flow and Figure 4.60 shows the topology of the 30% most critical paths for Huffman PKS-R flow. Unlike UMC-0.13, in IHP-0.25, the most critical paths are not clustered in a specific area of the layout. They also seem to be included in both modules of the design, dec1 and enc1.

Figure 4.61 shows the topology of the 30% most critical paths for Reed-Solomon SYN-P&R flow and Figure 4.62 shows the topology of the 30% most critical paths for Reed-Solomon PKS-R flow. Reed-Solomon shows the same behaviour with DES3 in both flows, as it is shown in Figures 4.61 and 4.62. The 30% most critical paths cover almost all of the layout in both flows, SYN-P&R flow and PKS-R flow.

Figure 4.63 shows the topology of the 30% most critical paths for RISC SYN-P&R flow and Figure 4.64 shows the topology of the 30% most critical paths for RISC PKS-R flow. Although with the SYN-P&R flow there is practically no clustering of the most critical paths, with the PKS-R flow the 30% most critical paths



Figure 4.59: Topology of critical paths for<br/>Huffman SYN-P&R flow.Figure 4.60: Topology of critical paths for<br/>Huffman PKS-R flow.



Figure 4.61: Topology of critical paths for Figure 4.62: Topology of critical paths forReed-Solomon SYN-P&R flow.Reed-Solomon PKS-R flow.

are significantly clustered in a relatively small area of the layout. The same happens for the UMC-0.13 implementation, but is even more evident for the IHP-0.25 implementation.



Figure 4.63: Topology of critical paths for<br/>RISC SYN-P&R flow.Figure 4.64: Topology of critical paths for<br/>RISC PKS-R flow.

Figure 4.65 shows the topology of the 30% most critical paths for VGA-LCD SYN-P&R flow and Figure 4.66 shows the topology of the 30% most critical paths for VGA-LCD PKS-R flow. As shown in Figures 4.65 and 4.66, both flows produce a layout where the 30% most critical paths cover almost all of the layout area.



Figure 4.65: Topology of critical paths for Figure 4.66: Topology of critical paths forVGA-LCD SYN-P&R flow.VGA-LCD PKS-R flow.

## 4.7.3 Typical Corner

All of the screen dumps refer to placed and routed designs with the SYN-P&R TYP flow and the number of paths highlighted is the same as in the previous sections for all designs.

Figure 4.67 shows the topology of the 30% most critical paths for Aemb SYN-P&R flow and Figure 4.68 shows the topology of the 30% most critical paths for Aemb PKS-R flow.



Figure 4.67: Topology of critical paths for Aemb in UMC-0.13 Figure 4.68: Topology of critical paths for

Figure 4.69 shows the topology of the 30% most critical paths for DES3 SYN-P&R flow and Figure 4.70 shows the topology of the 30% most critical paths for DES3 PKS-R flow.

Figure 4.71 shows the topology of the 30% most critical paths for DLX SYN-P&R flow and Figure 4.72 shows the topology of the 30% most critical paths for DLX PKS-R flow.

Figure 4.73 shows the topology of the 30% most critical paths for Huffman SYN-P&R flow and Figure 4.74 shows the topology of the 30% most critical paths for Huffman PKS-R flow.

Figure 4.75 shows the topology of the 30% most critical paths for Reed-Solomon SYN-P&R flow and Figure 4.76 shows the topology of the 30% most critical paths for Reed-Solomon PKS-R flow.



Figure 4.69: Topology of critical paths for Figure 4.70: Topology of critical paths forDES3 in UMC-0.13DES3 in IHP-0.25







Figure 4.73: Topology of critical paths for Huffman in UMC-0.13 Figure 4.74: Topology of critical paths for



Figure 4.75: Topology of critical paths for Figure 4.76: Topology of critical paths forReed-Solomon in UMC-0.13Reed-Solomon in IHP-0.25

Figure 4.77 shows the topology of the 30% most critical paths for RISC SYN-P&R flow and Figure 4.78 shows the topology of the 30% most critical paths for RISC PKS-R flow.



Figure 4.77: Topology of critical paths for Figure 4.78: Topology of critical paths for RISC in UMC-0.13 RISC in IHP-0.25

Figure 4.79 shows the topology of the 30% most critical paths for VGA-LCD SYN-P&R flow and Figure 4.80 shows the topology of the 30% most critical paths for VGA-LCD PKS-R flow.



Figure 4.79: Topology of critical paths for Figure 4.80: Topology of critical paths for VGA-LCD in UMC-0.13 VGA-LCD in IHP-0.25

For all of the designs, for both libraries, there is no clustering of the 30% most

# 4.8 Switching Activity Analysis

As shown in Tables 4.57, 4.58, 4.62 and 4.63, the clock tree accounts for most of the total switching activity. For Aemb and VGA-LCD, the rest of the switching activity is divided in the logic blocks of the designs according to their activity, as is shown in Tables 4.57 and 4.63. For DLX, the clock tree consumes about 30% of the total switching power, as is shown in Table 4.59. For Huffman in IHP-0.25 and RISC, the power consumption due to the clock tree is not so large comparing to the other designs as Table 4.60 and 4.61 show. However, for Huffman in UMC-0.13, the clock tree consumes as much as 60% of the total switching power.

| AeMB     | 013 TYP            | $013 \ \mathrm{WC}$ | 013 PKS-R           | $025 \mathrm{TYP}$ | 025  WC             | 025 PKS-R          |
|----------|--------------------|---------------------|---------------------|--------------------|---------------------|--------------------|
| clk      | $5.4 \mathrm{mW}$  | $2.5\mathrm{mW}$    | $2.9\mathrm{mW}$    | $20.6\mathrm{mW}$  | $11.3 \mathrm{mW}$  | $14.5 \mathrm{mW}$ |
| mFETCH   | $0.02 \mathrm{mW}$ | $0.002 \mathrm{mW}$ | $0.003 \mathrm{mW}$ | $0.3 \mathrm{mW}$  | $0.002 \mathrm{mW}$ | $0.1 \mathrm{mW}$  |
| mDECODE  | $0.2 \mathrm{mW}$  | $0.2\mathrm{mW}$    | $0.1 \mathrm{mW}$   | $1.4 \mathrm{mW}$  | $0.1 \mathrm{mW}$   | $0.7\mathrm{mW}$   |
| mEXECUTE | $0.1 \mathrm{mW}$  | $0.01 \mathrm{mW}$  | $0.1\mathrm{mW}$    | $1.8 \mathrm{mW}$  | $0.01 \mathrm{mW}$  | $1.2 \mathrm{mW}$  |
| mREGFILE | $1.2 \mathrm{mW}$  | $0.6\mathrm{mW}$    | $0.8\mathrm{mW}$    | $4.95 \mathrm{mW}$ | $2.0\mathrm{mW}$    | $3.4\mathrm{mW}$   |
| Total    | $7.2 \mathrm{mW}$  | $3.5 \mathrm{mW}$   | $4.0 \mathrm{mW}$   | $29.6 \mathrm{mW}$ | $19.2 \mathrm{mW}$  | $20.3 \mathrm{mW}$ |

Table 4.57: Switching activity for AeMB

| DES3  | 013 TYP           | 013  WC            | $013 \ \mathrm{PKS-R}$ | $025 \mathrm{TYP}$  | 025  WC             | 025  PKS-R          |
|-------|-------------------|--------------------|------------------------|---------------------|---------------------|---------------------|
| clk   | $24.9\mathrm{mW}$ | $16.2 \mathrm{mW}$ | $15.8\mathrm{mW}$      | $230.1\mathrm{mW}$  | $114.7 \mathrm{mW}$ | $101.2 \mathrm{mW}$ |
| Total | 44.6mW            | $27.6 \mathrm{mW}$ | $32.4 \mathrm{mW}$     | $448.5 \mathrm{mW}$ | $236.9\mathrm{mW}$  | $184 \mathrm{mW}$   |

Table 4.58: Switching activity for DES3

| DLX     | $013 \mathrm{TYP}$ | 013  WC            | 013 PKS-R         | $025 \mathrm{TYP}$  | 025  WC           | 025  PKS-R         |
|---------|--------------------|--------------------|-------------------|---------------------|-------------------|--------------------|
| clk     | $2.5\mathrm{mW}$   | $1.5 \mathrm{mW}$  | $1.5\mathrm{mW}$  | $29 \mathrm{mW}$    | $12.0\mathrm{mW}$ | $16.7\mathrm{mW}$  |
| IFinst  | $0.3 \mathrm{mW}$  | $0.1 \mathrm{mW}$  | $0.2\mathrm{mW}$  | $4.2 \mathrm{mW}$   | $1.5\mathrm{mW}$  | $1.2\mathrm{mW}$   |
| IDinst  | $2.3 \mathrm{mW}$  | $1.8\mathrm{mW}$   | $1.6\mathrm{mW}$  | $29.47 \mathrm{mW}$ | $12.9\mathrm{mW}$ | $17.4\mathrm{mW}$  |
| EXinst  | $2.4\mathrm{mW}$   | $1.7\mathrm{mW}$   | $1.7\mathrm{mW}$  | $23.9\mathrm{mW}$   | $9.7\mathrm{mW}$  | $16.0\mathrm{mW}$  |
| MEMinst | $0.07 \mathrm{mW}$ | $0.05 \mathrm{mW}$ | $0.05\mathrm{mW}$ | $1.0 \mathrm{mW}$   | $0.4 \mathrm{mW}$ | $0.3\mathrm{mW}$   |
| Total   | $8.8 \mathrm{mW}$  | $5.8 \mathrm{mW}$  | $6.2 \mathrm{mW}$ | 87.7mW              | 36.7mW            | $51.8 \mathrm{mW}$ |

Table 4.59: Switching activity for DLX
| Huffman | 013 TYP           | 013 WC              | 013 PKS-R         | $025 \mathrm{TYP}$ | 025  WC           | 025  PKS-R        |
|---------|-------------------|---------------------|-------------------|--------------------|-------------------|-------------------|
| clk     | $0.3 \mathrm{mW}$ | $0.2 \mathrm{mW}$   | $0.2 \mathrm{mW}$ | $1.8\mathrm{mW}$   | $0.9\mathrm{mW}$  | $1.2 \mathrm{mW}$ |
| en1     | $0.1 \mathrm{mW}$ | $0.008 \mathrm{mW}$ | $0.1 \mathrm{mW}$ | $1.6\mathrm{mW}$   | $1.0\mathrm{mW}$  | $1.3 \mathrm{mW}$ |
| dec1    | $0.2\mathrm{mW}$  | $0.01 \mathrm{mW}$  | $0.1 \mathrm{mW}$ | $5.4 \mathrm{mW}$  | $5.7 \mathrm{mW}$ | $5.0\mathrm{mW}$  |
| Total   | $0.5\mathrm{mW}$  | $0.3 \mathrm{mW}$   | $0.4 \mathrm{mW}$ | $8.8\mathrm{mW}$   | $7.7\mathrm{mW}$  | $7.6\mathrm{mW}$  |

Table 4.60: Switching activity for Huffman

| RISC  | $013 \mathrm{TYP}$ | $013 \ \mathrm{WC}$ | $013 \ \mathrm{PKS-R}$ | $025 \mathrm{TYP}$ | 025  WC           | $025 \ \mathrm{PKS-R}$ |
|-------|--------------------|---------------------|------------------------|--------------------|-------------------|------------------------|
| clk   | $1.0\mathrm{mW}$   | $0.1 \mathrm{mW}$   | $0.4\mathrm{mW}$       | $5.3 \mathrm{mW}$  | $3.7\mathrm{mW}$  | $2.8\mathrm{mW}$       |
| st1   | $0.1 \mathrm{mW}$  | $0.2\mathrm{mW}$    | $0.05 \mathrm{mW}$     | $0.75 \mathrm{mW}$ | $2.8\mathrm{mW}$  | $0.2\mathrm{mW}$       |
| st2   | $2.1\mathrm{mW}$   | $1.9\mathrm{mW}$    | $1.0\mathrm{mW}$       | $24.5 \mathrm{mW}$ | $12.8\mathrm{mW}$ | $16.8 \mathrm{mW}$     |
| st3   | $0.4\mathrm{mW}$   | $0.4 \mathrm{mW}$   | $0.1\mathrm{mW}$       | $4.5 \mathrm{mW}$  | $4.3 \mathrm{mW}$ | $3.4\mathrm{mW}$       |
| st4   | $0.09 \mathrm{mW}$ | $0.1 \mathrm{mW}$   | $0.03 \mathrm{mW}$     | $0.5\mathrm{mW}$   | $1.1 \mathrm{mW}$ | $0.05 \mathrm{mW}$     |
| Total | $3.7\mathrm{mW}$   | $2.8\mathrm{mW}$    | $1.7\mathrm{mW}$       | $35.7\mathrm{mW}$  | 21.1mW            | $23.5 \mathrm{mW}$     |

Table 4.61: Switching activity for RISC

| Reed-Solomon | 013 TYP           | 013 WC           | 013 PKS-R        | $025 \mathrm{TYP}$ | 025  WC            | 025  PKS-R         |
|--------------|-------------------|------------------|------------------|--------------------|--------------------|--------------------|
| clk          | $6.5 \mathrm{mW}$ | $3.1\mathrm{mW}$ | $3.8\mathrm{mW}$ | $59.1 \mathrm{mW}$ | $23.4\mathrm{mW}$  | $24.9\mathrm{mW}$  |
| Total        | 14.7mW            | $7.4\mathrm{mW}$ | $6.8\mathrm{mW}$ | 114.7mW            | $54.7 \mathrm{mW}$ | $52.3 \mathrm{mW}$ |

Table 4.62: Switching activity for Reed-Solomon

| VGA-LCD         | 013 TYP            | 013  WC            | 013 PKS-R          | $025 \mathrm{TYP}$  | 025  WC            | 025  PKS-R         |
|-----------------|--------------------|--------------------|--------------------|---------------------|--------------------|--------------------|
| clk             | $15.3 \mathrm{mW}$ | $11.8 \mathrm{mW}$ | 9.1mW              | $164.7 \mathrm{mW}$ | 81.1mW             | $72.4\mathrm{mW}$  |
| line_fifo       | $0.2 \mathrm{mW}$  | $0.1 \mathrm{mW}$  | $0.1\mathrm{mW}$   | $1.4\mathrm{mW}$    | $0.3 \mathrm{mW}$  | $0.2\mathrm{mW}$   |
| pixel_generator | $0.7\mathrm{mW}$   | $0.08\mathrm{mW}$  | $0.08\mathrm{mW}$  | $4.7\mathrm{mW}$    | $0.1 \mathrm{mW}$  | $0.1 \mathrm{mW}$  |
| clut_mem        | $15.7 \mathrm{mW}$ | $12.2\mathrm{mW}$  | $10.4 \mathrm{mW}$ | $198.5 \mathrm{mW}$ | $21.3 \mathrm{mW}$ | $17.3 \mathrm{mW}$ |
| wbm             | $0.2 \mathrm{mW}$  | $0.1 \mathrm{mW}$  | $0.1 \mathrm{mW}$  | $1.7\mathrm{mW}$    | $0.9\mathrm{mW}$   | $0.8\mathrm{mW}$   |
| wbs             | $0.5\mathrm{mW}$   | $0.3 \mathrm{mW}$  | $0.2\mathrm{mW}$   | $6.7\mathrm{mW}$    | $3.0\mathrm{mW}$   | $2.8\mathrm{mW}$   |
| Total           | 34.3mW             | 24.3mW             | $20.4 \mathrm{mW}$ | 377.9mW             | 107.1mW            | 94.1mW             |

Table 4.63: Switching activity for VGA-LCD

# Chapter 5

# The "Mythical" IP Block

In this chapter, the characteristics of the "mythical" IP block are presented. The "mythical" IP is an average-case IP, encompassing the characteristics of as many IP designs as possible. Even though the "mythical" IP does not exit, it can help EDA researchers and developers improve existing flows and tools, and improve designers' understanding of the properties of their design as it undergoes through the front and back-end parts of SYN-P&R and PKS-R EDA flows. The characteristics of the "mythical" IP block is the average of the characteristics of the benchmarks, which are studied in the previous chapters.

## 5.1 Post-Synthesis vs. Post-P&R Analysis

Table 5.1 shows the average maximum speed for the different technologies, flows and operating conditions. The difference between post-synthesis and post-P&R is

| Design          | Post-Syn TYP | Post-P&R TYP | Post-Syn WC | Post-P&R WC | PKS-R  |
|-----------------|--------------|--------------|-------------|-------------|--------|
| Mythical IP IHP | 4.1ns        | 5.5ns        | 8.4ns       | 10.5ns      | 9.3 ns |
| Mythical IP UMC | 1.7ns        | 2.2ns        | 2.9ns       | 3.3ns       | 2.9ns  |

Table 5.1: Average Maximum Speed.

just below 30% in all cases, except the worst-case corner of UMC-0.13, where it is about 20%. The designs implemented at the worst corner are about 50% slower than the designs implemented at the typical corner for IHP-0.25 and about 40% slower for UMC-0.13. PKS-R flow produces designs which are about 11% faster

### 5.2 Area vs. Speed Analysis

Figures 5.2 and 5.2 show the averaged area-speed results, *i.e.* over all 7 designs, for the  $0.25\mu$ m and  $0.13\mu$ m technology libraries respectively. Area and Speed are shown in averaged percentages, with the fastest TYP design being the reference design, *i.e.* 100% Frequency, 100% Area.



Figure 5.1: Average Area-Speed Results -  $0.25\mu m$  Process

For the  $0.25\mu$ m library it is clear from looking across the Y-axis of the Area-Speed curve that although the average area between the TYP and WC the same is approximately the same, WC speed is only slightly over 50% compared to TYP speed. Therefore, a design implemented in this process using WC conditions has an approximate 100% timing penalty, *i.e.* the cycle time shows approximately half. On the other hand, looking across the X-axis, the area required in TYP compared to the area in WC is approximately 80%.

The point of the WC PKS-R flow seems to lie exactly on the area-speed  $0.25\mu$ m WC curve, however, it is below the  $0.13\mu$ m WC curve.

As the target clock frequency is lowered by 25%, the greatest possibility is that



Figure 5.2: Average Area-Speed Results -  $0.13\mu$ m Process

the area saving will be no more than 10%, although there are some exception where the area may be shrinked by at most 25%.

### 5.3 Power vs. Speed Analysis

Figures 5.3 and 5.3 show the averaged power-speed results, in a similar way as the area-speed results presented above.

For both libraries, the TYP power seems to scale linearly with the frequency, which is also the case for the WC curve. The different conditions (voltage and frequency) for WC of the two libraries, affect the power estimation, as for the  $0.25\mu$ m library the WC curve is always above the TYP curve, while for the 0.13  $\mu$ m library the opposite happens (the TYP curve is above the WC curve). The WC PKS-R flow point is in both cases below the WC curve.

### 5.4 Power Distribution Analysis

Table 5.2 shows the average clock network power consumption. For all of the experiments, the clock network power consumption is about half of the total power



Figure 5.3: Average Power-Speed Results -  $0.25\mu m$  Process



Figure 5.4: Average Power-Speed Results -  $0.13\mu m$  Process

consumption. The remaining power consumption is divided to the functional units of the design and is proportional to their activity, as is shown by the results presented in the previous chapter.

| Mythical IP | 013 TYP            | 013  WC             | 013 PKS-R           | $025 \mathrm{TYP}$   | 025  WC            | 025  PKS-R         |
|-------------|--------------------|---------------------|---------------------|----------------------|--------------------|--------------------|
| clock       | $8.0\mathrm{mW}$   | $5.1 \mathrm{mW}$   | $4.8\mathrm{mW}$    | $72.9\mathrm{mW}$    | $35.3 \mathrm{mW}$ | $33.4\mathrm{mW}$  |
| Total       | $16.3 \mathrm{mW}$ | $10.26 \mathrm{mW}$ | $10.27 \mathrm{mW}$ | $157.56 \mathrm{mW}$ | $69.1 \mathrm{mW}$ | $61.94\mathrm{mW}$ |

Table 5.2: Average Clock Network Power Consumption.

## 5.5 Critical Path Timing Distribution Analysis

Table 5.3 shows the average percentage of cells in the path margins.

For both libraries, the percentage of cells that are in a path margin is always larger according to the estimation after synthesis than the estimation after placement and routing. The difference between PKS-R flow and SYN-p&R flow is about 3% of the total number of cells for both technologies.

| Design      | 5%    | 10%   | 15%   | 20%   | 30%   | Total    |
|-------------|-------|-------|-------|-------|-------|----------|
| SYN TYP IHP | 23.50 | 27.30 | 28.90 | 32.56 | 33.82 | 18959.71 |
| SYN TYP UMC | 19.76 | 24.47 | 26.85 | 28.98 | 30.74 | 20883.86 |
| SYN WC IHP  | 22.14 | 29.75 | 30.91 | 33.09 | 33.66 | 19970.29 |
| SYN WC UMC  | 18.52 | 19.28 | 19.61 | 22.59 | 25.27 | 20044.71 |
| P&R TYP IHP | 13.51 | 16.78 | 18.87 | 20.93 | 22.65 | 21216    |
| P&R TYP UMC | 15.68 | 18.74 | 20.51 | 21.73 | 23.65 | 20933    |
| P&R WC IHP  | 15.18 | 18.24 | 20.01 | 22.24 | 25.77 | 22359.29 |
| P&R WC UMC  | 13.74 | 15.75 | 17.99 | 19.06 | 22.22 | 20310.71 |
| PKS-R IHP   | 14.38 | 17.60 | 18.76 | 20.07 | 22.38 | 24439.86 |
| PKS-R UMC   | 18.35 | 19.93 | 20.85 | 22.91 | 25.50 | 20755.29 |

Table 5.3: Average percentage of cells in path margins.

## 5.6 Critical Path Physical Distribution Analysis

The analysis of the physical distribution of the most critical paths on the layout shows that there is no particular clustering of the paths. In three out of seven designs, the critical paths cover almost all of the layout area for both technologies, flows and operating conditions. The three processor benchmarks show some limited clustering, but only with the PKS-R flow.

# 5.7 Design Balance Analysis

The study of the pipelined benchmarks shows that their pipeline is unbalanced. Typically, there are one or two pipeline stages with a delay which is close to the maximum period (95% or more of the maximum period), while the remaining pipeline stages have a delay which is about half the delay of the slowest stages.

# Chapter 6

# **Conclusion and Future Work**

This work has attempted to characterize the IP blocks that the designers use in their systems. Based on a selection of benchmarks from an open source hardware IP block library [ope], the behaviour of the typical IP block has been derived. This characterization aims to help designers to predict the behaviour of the IP block that they are going to use in their system.

The main contribution of this work is the characterization of the typical IP block in terms of area-speed tradeoff, pipeline balancing, critical path clustering, powerspeed tradeoff, synthesis-place & route maximum frequency, critical path variability, critical path topology and switching activity.

### 6.1 Future Work

There are several issues that need to be examined in the future. The benchmark suite can be expanded to represent a wider range of IP blocks and the results can be used in order to explore new techniques in the direction of optimizing the design.

#### 6.1.1 Expansion of the Benchmark Suite

In order to expand the space at which the results of this work can be applied, more IP blocks can be used as additional benchmarks. The same experiments can be carried out in the new benchmarks in order to derive the results from a wider range of benchmarks. Moreover, since the experiments have been carried out in two libraries (a  $0.13\mu$ m and a  $0.25\mu$ m technology), the same experiments can be applied to the same benchmarks in more libraries such as a  $0.18\mu$ m or a  $0.065\mu$ m technology.

New flows can be added to the experimental process. Currently there have been used two commercial flows, one of which consists of physically knowledgeable synthesis. The addition of more commercial or non commercial flows can give a clear picture of how the IP blocks interact with a wider range of flows.

### 6.1.2 Optimizing the IP Blocks

The critical path topology results can give an insight of how big the optimization margin is by optimizing the topology of the critical paths. Fine-tuning the topology of the critical paths may yield better delays of the critical paths, thus better overall performance of the design.

Further exploration of the source of the most switching activity can provide guidelines for power minimization of the designs. The identification of the blocks that are most likely to consume more power through their switching activity can efficiently direct the power optimization process.

The identification of the pipeline stage that is the most critical provides a way of optimizing the delay of the whole design. Optimization techniques can be applied to this specific pipeline stage (such as Dual Rail with completion detection circuitry), so that the performance of the design is improved. Additionally, the identification of the logic block which is credited for most of the delay of the pipeline stage can give some room for even more efficient optimization processes.

### 6.1.3 Optimizing the EDA Flows

The results of this work show how the IP blocks are affected by two commercial EDA flows. The experiments have shown that in some aspects like pipeline balancing and, critical path topology and area-speed, power-speed tradeoffs are affected by the different flows. In some cases one or another flow produces more efficient designs. whereas in some other cases the EDA flows fail to produce an efficient result. Using the observations from this work, the EDA flows can be optimized in order to suit the contemporary IP blocks and to produce efficient designs.

# Bibliography

- [aem] Aemb core. http://www.opencores.org/projects.cgi/web/aemb/overview.
- [BC02] R. A. Bergamaschi and J. Cohn. The a to z of socs. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, pages 790–798, 2002.
- [BCK<sup>+</sup>04] I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin, and C. Sotiriou. Handshake protocols for de-synchronization. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 149–158. IEEE Computer Society Press, April 2004.
- [BL00] R. A. Bergamaschi and W. R. Lee. Designing systems-on-chip using cores. Proceedings of the 37th conference on Design automation, pages 420–425, 2000.
- [BMP97] L. Benini, E. Macii, and M. Poncino. Telescopic units: increasing the average throughput of pipelined designs by adaptive latency control. Proceedings of the 34th annual conference on Design automation, 00:22–27, 1997.
- [cel04] User guide. In Celtic User Manual, Cadence Design Systems, Inc., 2004.
- [CK02] D. Chinnery and K. Keutzer. Reducing the timing overhead. In Closing the Gap between ASIC and Custom: Tools and Techniques for High-Performance ASIC design, chapter 3. Kluwer Academic Publishers, 2002.
- [des] Des3 core. http://www.opencores.org/projects.cgi/web/des/overview.

[dlx] Dlx core.

http://www.opencores.org/projects.cgi/web/aspida/overview.

- [DWD91] M. Dean, T. Williams, and D. Dill. Efficient self-timing with levelencoded 2-phase dual-rail (LEDR). In Carlo H. Séquin, editor, Advanced Research in VLSI, pages 55–70. MIT Press, 1991.
- [DYG89] D. H. Du, S. H. Yen, and S. Ghanta. On the general false path problem in timing analysis. In DAC '89: Proceedings of the 26th ACM/IEEE conference on Design automation, pages 555–560, New York, NY, USA, 1989. ACM Press.
- [GK98] R. Ginosar and R. Kol. Adaptive synchronization. In Proc. International Conf. Computer Design (ICCD), pages 188–189, October 1998.
- [GK00] R. Ginosar and R. Kol. Adaptive synchronization. In Alex Yakovlev and Reinder Nouta, editors, Asynchronous Interfaces: Tools, Techniques, and Implementations, pages 93–101, July 2000.
- [HE96] S. Hassoun and C. Ebeling. Architectural retiming: pipelining latencyconstrained circuits. Proceedings of the 33rd annual conference on Design automation, pages 708–713, 1996.
- [HP90] J. L. Hennessy and D. Patterson. Computer Architecture: a Quantitative Approach. Morgan Kaufmann Publisher Inc., 1990.
- [huf] Huffman core. http://www.opencores.org/projects.cgi/web/video\_systems/overview.
- [ihp] Ihp-microelectronics. http://www.ihp-microelectronics.com.
- [Mar98] G. Martin. Design methodologies for system-level ip. Proceedings of the conference on Design, Automation and Test in Europe, pages 286–289, 1998.
- [Nas01] S.R Nassif. Modeling and analysis of manufacturing variations. In Proc. of Asia and South Pacific Design Automation Conference, May 2001.

| [ope]   | The opencores.org.<br>http://www.opencores.org.                                                                                                                                                                                                                             |
|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Phy03] | Datasheet. In Physical Studio Datasheet, Sequence Design, Inc., 2003.                                                                                                                                                                                                       |
| [rs]    | Reed-solomon core.<br>http://www.opencores.org/projects.cgi/web/rsencoder/overview.                                                                                                                                                                                         |
| [SBY03] | N. Starodoubtsev, S. Bystrov, and A. Yakovlev. Monotonic circuits with<br>complete acknowledgement. In <i>Proc. International Symposium on Ad-</i><br><i>vanced Research in Asynchronous Circuits and Systems</i> , pages 98–108.<br>IEEE Computer Society Press, May 2003. |
| [umc]   | Umc.<br>http://www.umc.com.                                                                                                                                                                                                                                                 |
| [vga]   | Vga-lcd core.<br>http://www.opencores.org/projects.cgi/web/vga_lcd/overview.                                                                                                                                                                                                |
| [wis]   | Wishbone interface.<br>http://www.opencores.org/projects.cgi/web/wishbone/wishbone.                                                                                                                                                                                         |
| [xil]   | Xilinx, inc.                                                                                                                                                                                                                                                                |

http://www.xilinx.com.