Your browser does not support JavaScript!

Home    An instruction level energy characterization of ARM processors  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000391349
Title An instruction level energy characterization of ARM processors
Alternative Title Ένας ενεργειακός χαρακτηρισμός επεξεργασιών ARM σε επίπεδο εντολέα
Author Βασιλάκης, Ευάγγελος Γ.
Thesis advisor Κατεβαίνης, Μανόλης
Reviewer Μπίλας, Άγγελος
Πρατικάκης, Πολύβιος
Αργυρός, Αντώνιος
Abstract As mobile devices and data-centers expand to cover global needs for services and personal computing, power consumption of systems and devices has become the most prevalent concern for hardware designers and software developers. ARM processors already dominate the mobile world and are taking leaps into the server market due to their inherent energy efficiency. In this work we study the energy characteristics of modern ARM processors at the instruction level. To characterize the energy consumption of ARM processors we measure the energy consumption of special purpose benchmarks. Our measurements are made using actual voltage/current sensors provided by the Odroid-XU+E development board which contains an ARM big.LITTLE processor consisting of two clusters of four Cortex-A7 and four Cortex-A15 cores. Our characterization benchmarks are designed specifically to stress specific units of the datapath. With two different benchmarks for each instruction type, we study both the latency and the energy of instructions as well as the maximum throughput of the processor for that instruction. Our findings for Cortex-A7 cores show that integer instructions cost from 50 to 80 pJ each, float/double instructions from 80 pJ to 350 pJ each, and more complex instructions like divisions cost from 150 pJ to 1200 pJ per instruction. Load and store instructions cost 150 pJ to 200 pJ each when hitting in the L1 cache whereas the cost increases up to 270 pJ when accessing the L2 cache. On the Cortex-A15, instructions cost three to five times more than on Cortex-A7 for the same clock frequency, even when the two cores show the same throughput for an instruction. For benchmarks that fit mostly in the L1 cache, we observed that at a same clock frequency, their execution time is 20% to four times faster on Cortex-A15, while energy to completion is increased by 2 to 4 times, relative to Cortex-A7. When comparing Cortex-A7 at the lowest frequency of 500 MHz to Cortex-A15 at the highest frequency of 1.5 GHz, we see that the execution time is 4 to 10 times faster on Cortex-A15, while energy to completion is increased by 5 to 9 times relative to Cortex-A7 Through these measurements, we developed a thorough characterization of the ARM instruction set with energy and latency metrics for every instruction type. We validated the correctness of our characterization by developing an instruction level energy model and testing it on a variety of real programs. Our evaluation shows average mispredictions of 8.5% for Cortex-A7 and 14% for Cortex-A15. Furthermore, we utilize our characterization and energy model to quantify the energy characteristics of heterogeneous multiprocessing, like ARM big.LITTLE, and show how this can help optimal workload placement in such systems. We highlight the different factors that contribute to the energy expenditure of such systems and show how these differ from one processor to the other.
Language English, Greek
Subject Model
Ενέργεια
Επεξεργαστές
Επιπέδου εντολής
Μοντέλο
Issue date 2015-03-20
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 600

Digital Documents
No preview available

Download document
View document
Views : 24