## Lecture 6: Energy Reduction

## CSCE 6730 Advanced VLSI Systems

Instructor: Saraju P. Mohanty, Ph. D.

NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.

## Outline of the Talk

- Why frequency variations ?
- What is dynamic frequency clocking (DFC) ?
- TC-DFC scheduling scheme
- RC-DFC scheduling scheme
- Experimental results
- Related Research
- Conclusions

Source: S. P. Mohanty and N. Ranganathan, "Energy Efficient Datapath Scheduling using Multiple Voltages and Dynamic Clocking", ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol. 10, No. 2, April 2005, pp. 330-353.

## Why Frequency Variations?

1. Energy dissipation per operation, $\boldsymbol{E}=\boldsymbol{C}_{\text {eff }} * \boldsymbol{V}_{d d}{ }^{2}$
2. Power dissipation per operation, $\boldsymbol{P}=\boldsymbol{C}_{\text {eff }} * \boldsymbol{V}_{\boldsymbol{d} d}{ }^{2} * \boldsymbol{f}$
3. Delay that determines maximum frequency,

$$
t_{d}=k * V_{d d} /\left(V_{d d}-V_{T}\right)^{\alpha}
$$

where, $\alpha$ is a technology dependent factor, $k$ is a constant and $\mathbf{1 /} \boldsymbol{t}_{\boldsymbol{d}}=\boldsymbol{f}_{\max }$

## What we deduce from the equations ?

1. If we reduce supply voltage $\left(V_{d d}\right)$, delay increases and performance degrades.
2. If we reduce clock frequency ( $f$ ), we save only power, but do not save energy.
3. If we reduce switching activity, we reduce the effect of ( $C_{\text {eff }}$ ) as well as correlations to an extent. - this needs to be done irrespective of 1 and 2 above.

## Our Approach

Adjust the processor's frequency and reduce the supply voltage together during scheduling

By doing that we reduce energy while maintaining performance or even improve performance if possible!

## Dynamic Frequency Clocking (DFC)



## Dynamic Clocking Unit (DCU)



DCU uses clock divider strategy.

More details : Ranganathan [12]

## DFC for energy savings and performance

Cycle1

## TC-DFC/RC-DFC Schedulers

What they do ?
Attempts to operate high energy units (multipliers) at lower frequencies (voltages) and low energy units (ALUs) at higher frequencies (voltages) to save energy without loosing performance as much as possible

## TC-DFC Algorithm

Input: Unscheduled DFG, time constraint (execution time for critical path), operating frequencies and voltages

Output: Scheduled DFG with frequency assignment for each control step (cycle) and voltage assignment for each vertex (operation)

## TC-DFC Algorithm

Vertex priority list: the vertices from source to sink in DFG are reordered such that the multipliers are grouped with higher priority than adders and among the multipliers (and adders) the precedence in DFG is maintained. (used to ensure precedence by time-stamping)

Cycle priority list: the control steps or the cycles are reordered in this list such that the cycles with more simultaneous operations (consuming more energy) get higher priority. (used for low frequency assignment in the algorithm)

## TC-DFC: Scheduling Algorithm

Step 1: ASAP schedule of the sequencing UDFG
Step 2: create vertex priority list of vertices in DFG
Step 3: assign control steps to the operators such that precedence is satisfied and that multiply and add operators are not scheduled in the same cycle - yields an intermediate schedule

Step 4: create a cycle priority list using the intermediate schedule
Step 5: assign frequency to each control step using cycle priority list assign higher frequency to lower priority steps

Step 6: If execution time not close to time constraint, then eliminate the control step that has minimal number of ALU operations also adjusting all its predecessors to get a new intermediate schedule. Go to Step 3.

Step 7: To the schedule with frequency assignment for each control step, perform voltage assignment for each operation

## Example to illustrate TC-DFC


(HAL Differential Equation Solver)


## Example TC-DFC : ASAP Time stamp



## TC-DFC : Freq. Selection \& Vertex Priority

|  | MULT $_{\text {Low }}$ | MULT $_{\text {Med }}$ | MULT $_{\text {Med }}$ | ALU $_{\text {High }}$ |
| :---: | :---: | :---: | :---: | :---: |
| Frequency | 4.5 MHz | 9 MHz | 18 MHz | 36 MHz |
| cfi $_{\mathrm{c}}$ | 8 | 4 | 2 | 1 |

Table : Frequency selection : Left to right

| v0 | v1 | v2 | v6 | v8 | v3 | v7 | v10 | v9 | v11 | v4 | v5 | v12 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |

Table : Vertex Priority List

## TC-DFC : Intermediate Schedule



| Cycles | c5 | c4 | c3 | c2 | c1 | c6 | c0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Priorities | 0 | 1 | 2 | 3 | 4 | 5 | 6 |

## TC-DFC : Schedule : $\mathrm{T}_{\mathrm{c}}=2.0 \mathrm{~T}_{\mathrm{cp}}$




TC-DFC : Schedule : $\mathrm{T}_{\mathrm{c}}=$ 1.75 $_{\mathrm{cp}}$



## TC-DFC : Intermediate Schedule



| Cycles | c4 | c3 | c2 | c1 | c5 | c0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Priorities | 0 | 1 | 2 | 3 | 4 | 5 |

Table : $\mathrm{T}_{\mathrm{c}}=1.5 \mathrm{~T}_{\mathrm{cp}}$
Advanced VLSI Systems

## TC-DFC Schedule : $\mathrm{T}_{\mathrm{c}}=1.5 \mathrm{~T}_{\mathrm{cp}}$



## TC-DFC : Savings for different benchmarks



## TC-DFC : Average savings for 3 time constraints



## TC-DFC : Energy savings using different algorithms

|  | TC-DFC | Chang[3] | Shiue[14] | Manzak[9] |
| :---: | :---: | :---: | :---: | :---: |
| ARF | $41-58$ | $40-63$ | $38-76$ | $25-61$ |
| BPF | $45-70$ | - | - | - |
| EWF | $36-73$ | $44-69$ | $13-76$ | $10-55$ |
| FDCT | $52-75$ | $43-69$ | - | - |
| FIR | $74-74$ | - | - | - |
| HAL | $43-67$ | $41-61$ | $22-77$ | $19-72$ |

## RC-DFC Algorithm

Input: Unscheduled DFG, Resource constraints (Number of resources at different operating voltages), Maximum number of allowable operating frequencies

Output: Scheduled DFG with frequency assignment for each control step (cycle)

## RC-DFC : Frequency and Resource LUTs

| FU in a cycle | Frequency priority order |
| :---: | :---: |
| Multipliers only | MULT $_{\text {Low }}, \mathrm{MULT}_{\text {Med }}, \mathrm{MULT}_{\text {High }}$ |
| Both Multipliers and ALUs | $\mathrm{MULT}_{\text {Low }}, \mathrm{ALU}_{\text {Low }}, \mathrm{MULT}_{\text {High }}$ |
| ALU only | $\mathrm{ALU}_{\text {High, }}, \mathrm{ALU}_{\text {Med }}, \mathrm{ALU}_{\text {Low }}$ |

Frequency selection look up from left to right

|  | Multipliers |  |  | ALUs |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 2.4 V | 3.3 V | 5.0 V | 2.4 V | 3.3 V | 5.0 V |
| For each <br> clock cycle | 1 | 2 | 1 | 1 | 1 | 0 |

Resource look up from left to right

## RC-DFC Scheduling Algorithm

Step 1 : Get the ASAP and ALAP schedule
Step 2 : Modify the ASAP and ALAP schedules using the number of resources without operating voltage constraint
Step 3 : Total No. of control steps = Max (ASAP steps, ALAP steps)
Step 4 : Construct the Freq. Selection and Resource Assignments LUTs

Step 5 : Find the vertices having zero and non-zero mobility and assume ASAP schedule as the current schedule

Step 6 : Do voltage and freq. assignment using current schedule
Step 7 : Taking non-zero mobility vertex find proper time stamp for it using LUTs such that number of level conversions needed is minimum

Step 8 : Adjust current schedule, predecessor / successor time U Stamp, resource assignment LUT and repeat Step 6 and Ste Advanced VLSI Systems
Unve. Step 9 Find cycle frequency index for each cycle

## RC-DFC : A Scheduled DFG (FIR benchmark)



FIR ; Resource constraint : MULT(1 at $2.4 \mathrm{~V}, 1$ at 3.3 V ) and ( 1 ALU at 5.0 V )

## RC-DFC : Resource constraints used in our experiment

| Multipliers |  | ALUs |  | Serial |
| :---: | :---: | :---: | :---: | :---: |
| 3.3 V | 5.0 V | 3.3 V | 5.0 V | No |
| 2 | 1 | 1 | 1 | 1 |
| 3 | 0 | 1 | 1 | 2 |
| 2 | 0 | 0 | 2 | 3 |
| 1 | 1 | 0 | 2 | 4 |
| 1 | 1 | 0 | 1 | 5 |
| 2 | 0 | 0 | 1 | 6 |

RC-DFC : Savings for different benchmarks

|  | Resource <br> Cons. | Energy <br> (SF) (pJ) | Energy <br> (DFC) (pJ) | \% <br> Savings |
| :---: | :---: | :---: | :---: | :---: |
| A | 1 | 36168 | 21768 | 40 |
| R | 2 | 36168 | 18205 | 50 |
| F | 3 | 36168 | 19065 | 47 |
| B | 1 | 27654 | 16491 | 40 |
| P | 2 | 27654 | 14175 | 49 |
|  | 3 | 27654 | 14827 | 46 |
| E | 1 | 19404 | 10802 | 44 |
| W | 2 | 19404 | 10802 | 44 |
| F | 3 | 19404 | 10853 | 39 |
| F | 1 | 18678 | 9979 | 47 |
| I | 2 | 18678 | 9979 | 47 |
| R | 3 | 18678 | 10126 | 45 |

## RC－DFC ：Average Savings for benchmarks



RC-DFC : Energy savings using different algorithms

|  | RC-DFC | Shiue[14] | Sarrafzadeh[13] | Johnson[6] |
| :---: | :---: | :---: | :---: | :---: |
| ARF | $24-58$ | $11-14$ | $16-20$ | $16-59$ |
| BPF | $27-56$ | - | - | - |
| EWF | $38-61$ | $14-14$ | $13-32$ | $11-50$ |
| FDCT | $41-63$ | - | - | - |
| FIR | $20-67$ | - | $16-29$ | - |
| HAL | $29-62$ | $19-28$ | - |  |

## Related Research <br> Low Power Scheduling using Voltage Reduction

- Johnson \& Roy [6], 1997 - MOVER algorithm, multiple voltage
- Chang \& Pedram [3], 1997 - dynamic programming
- Kumar \& Bayoumi [7], 1999 - variable voltage
- Sarrafzadeh and Raje [13], 1999-dynamic prog., geometric
- Schiue \& Chakrabarti [14], 2000 - list based, multiple voltage
- Manzak \& Chakrabarti [9], 2002 - energy minimization using Lagrange multiplier method
- And many other works


## Related Research : MOVER [6]

-ILP formulations of multiple supply voltage scheduling (MOVER)

- Minimize energy consumption
- The DFG is partitioned into groups :

Higher voltage operation group
Lower voltage operation group
MOVER $1^{\text {st }}$ fixes the minimum voltage of the lower group and then fixes the minimum voltage for the upper group
-Handle timing and resource constraints
-Exponential worst-case complexity
-Energy Savings : 0-50\%, Area Penalty: 0-170\%

## Related Research : Dynamic Programming[3]

-Dynamic Programming for multiple voltage scheduling

- Time-constraints scheduling
- Average energy savings : 40 \%
- Non-exponential complexity
- Can handle large circuits, pipeline datapath


## Related Research : List-Based [14]

- Multiple voltage scheme
-List based scheduling algorithms: resource constraints based and time constraints based
-Polynomial time complexity algorithms
- Multiple voltage scheduling and reduce switching activity
-Savings : Latency constrained case : 13-77\%


## Conclusions

- A time-constrained scheduling algorithm discussed
-In DFC, frequency can be switched dynamically
- Lowering frequency with voltage can save energy
- For TC-DFC average energy savings is $46 \%$ to $68 \%$
- As the time-constraint is relaxed the energy savings increases
- For RC-DFC average energy savings is $20 \%$ to $67 \%$
- For the circuits having almost equal number of addition and multiplier operations savings is maximum

