# Lecture 8: Peak Power Reduction

# CSCE 6730 Advanced VLSI Systems

Instructor: Saraju P. Mohanty, Ph. D.

 NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.





# **Outline of the talk**

- Introduction
- Related work
- Target architecture
- Peak power model
- ILP Formulations
- Scheduling algorithm
- Experimental results

Source: S. P. Mohanty and N. Ranganathan, "<u>Simultaneous Peak and Average</u> <u>Power Minimization during Datapath Scheduling</u>", *IEEE Transactions on Circuits and Systems Part I (TCAS-I)*, Vol. 52, No. 6, June 2005, pp. 1157-1165.







# The peak power is the maximum power consumption of the circuit at any instance during its execution.





CSCE 6730: Advanced VLSI Systems

# Why peak power reduction ?

Reduction of peak power consumption is essential :

- (i) to maintain supply voltage levels
- (ii) to increase reliability
- (iii) to use smaller heat sinks
- (iv) to make packaging cheaper





#### **Energy Vs Peak power efficient scheduling**



Fig.(a) is energy efficient schedule, whereas Fig.(b) is peak power efficient schedule for same resource constraint





#### **Related work**

(Energy efficient scheduling using voltage reduction)

- Chang and Pedram [3], 1997 Dynamic programming
- Johnson and Roy [4], 1997 ILP based MOVER algorithm using multiple supply voltages
- Lin, Hwang and Wu [5], 1997 ILP and heuristic for variable voltages (VV) and multicycling (MC)
- Mohanty and Ranganathan [7], 2003 Heuristic based using multiple supply voltage and dynamic clocking





# **Related work**

#### (Peak Power efficient scheduling)

- Martin and Knight [6], 1996 Simultaneous assignment and scheduling.
- Raghunathan, Ravi and Raghunathan [10], 2001 – data monitor operations in VHDL.
- Shiue [12], 2000 ILP based and modified force direct scheduling for peak power minimization.
- Shiue and Chakrabarti [13], 2000 ILP model to minimize peak power and area for single voltage.





# **Voltage, Frequency and Power Trade-offs**

(i) voltage reduction increase in delay

(ii) frequency reduction reduction in power not energy (and increase in delay)

"Beyond of (i) and (ii) reduction of switching capacitance can be considered."





# What is our approach ?

Adjust the frequency and reduce the supply voltage for peak power reduction during datapath scheduling.





#### **Target architecture**



- □ All functional units have one register each and one multiplexor.
- □ Each functional unit feeds one register only.
- The register and the multiplexor operate at the same voltage level as that of the functional units.
- Level converters are used when a low-voltage functional unit is driving a high-voltage functional unit.
- Operational delay of a FU :  $(d_{FU} + d_{Mux} + d_{Req} + d_{Conv})$ .







# Peak power model

For a DFG let us assume :

- c = any control step or clock cycle in DFG
- N = total number of control steps in the DFG
- R<sub>c</sub> = number of resources active in step c (same as number of operations in step c)
- $f_c$  = cycle frequency for control step c
- $\alpha_{i,c}$  = switching at resource i active in step c
- $C_{i,c}$  = load capacitance of resource i active in step c
- $V_{i,c}$  = operating voltage of resource i active in step c





# Peak power model ....

The power consumption for any control step c is given by,

$$\mathbf{P}_{c} = \sum_{i=\{1 \rightarrow Rc\}} \alpha_{i,c} \ \mathbf{C}_{i,c} \ \mathbf{V}_{i,c}^{2} \ \mathbf{f}_{c}$$

The peak power consumption of the DFG is the maximum power consumption over all the control steps,

 $P_{peak} = maximum(P_c)_{c=\{1 \rightarrow N\}}$ 





# Peak power model ....

Using the above two equations the peak power consumption of the DFG is described as,

$$P_{\text{peak}} = \text{maximum} \left( \sum_{i=\{1 \rightarrow \text{Rc}\}} \alpha_{i,c} C_{i,c} V_{i,c}^2 f_c \right)_{c=\{1 \rightarrow N\}}$$

This would serve as an objective function for the scheduling algorithm.





# **ILP formulations for MVDFC : notations**

- O: total number of operations in the DFG
- $o_i$ : any operation i, 1 <= i <= O
- $F_{k,\nu}$  : functional unit of type k operating at voltage level  $\nu$
- $M_{k,\nu}$  :maximum number of functional units of type k operating at voltage level  $\nu$
- $S_i$ : as soon as possible time stamp for the operation  $o_i$
- E<sub>i</sub> :as late as possible time stamp for the operation o<sub>i</sub>
- P(i,v,f) : power consumption of operation o<sub>i</sub> at voltage level v and operating frequency f
- $x_{i,c,v,f}$ : decision variable which takes the value of 1 if operation o<sub>i</sub> is scheduled in control step c using the functional unit  $F_{k,v}$  and c has frequency f



14

(i) Objective Function
(ii) Uniqueness Constraints
(iii) Precedence Constraints
(iv) Resource Constraints
(v) Frequency Constraints
(vi) Peak Power Constraints









Resource Constraints : make sure that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage v and are enforced as,  $\forall c, 1 \le c \le N$  and  $\forall v, \Sigma_{\{i \in F_{k,v}\}} \Sigma_f x_{i,c,v,f} \le M_{k,v}$ 



- Frequency Constraints : lower operating voltage functional unit can't be scheduled in a higher frequency control step; these constraints are expressed as,  $\forall i$ ,  $1 \le i \le O$ ,  $\forall c$ ,  $1 \le c \le N$ , if f < v, then  $x_{i,c,v,f} = 0$ .
- Peak Power Constraints : ensure that the maximum power consumption of the DFG does not exceed  $P_{peak}$  for any control step and we enforce these constraints as follows,  $\forall c$ ,  $1 \le c \le N$  and  $\forall v$ ,  $\sum_{\{i \in F_{k,v}\}} \sum_{f} x_{i,c,v,f} P(i,v,f) \le P_{peak}$





# **ILP formulations for MVMC : notations**

- O: total number of operations in the DFG
- $o_i$ : any operation i, 1 <= i <= O
- $F_{k,v}$ : functional unit of type k operating at voltage level v
- $M_{k,v}$  :maximum number of functional units of type k operating at voltage v
- $S_i$ : as soon as possible time stamp for the operation  $o_i$
- $E_i$  :as late as possible time stamp for the operation  $o_i$
- $P(i,v,f_{clk})$  : power consumption of operation  $o_i$  at voltage level v and operating frequency  $f_{clk}$
- $y_{i,v,l,m}$ : decision variable which takes the value of 1 if operation  $o_i$  is using the functional unit  $F_{k,v}$  and scheduled in control steps  $l \rightarrow m$
- $L_{i,v}$ : latency for operation  $o_i$  using resource operating at voltage v (in terms of number of clock cycles)





(i) Objective Function
(ii) Uniqueness Constraints
(iii) Precedence Constraints
(iv) Resource Constraints
(v) Peak Power Constraints





Objective Function : Minimize (P<sub>peak</sub>)

Uniqueness Constraints : ensure that every operation  $o_i$  is scheduled to one appropriate control step within the range  $(S_i, E_i)$  and represented as,  $\forall i, 1 \le i \le 0$ ,  $\sum_{v} \sum_{\{l=S_i \rightarrow (S_i+E_i+1-L_i,v)\}} y_{i,v,l,(l+L_i,v-1)} = 1$ 









Resource Constraints : make sure that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage v and are enforced as,  $\sum_{\{i \in F_{k,v}\}} \sum_{l} y_{i,v,l,(l+L_{i,v}-1)} \leq M_{k,v}$ 

Peak Power Constraints : ensure that the maximum power consumption of the DFG does not exceed P<sub>peak</sub> for any control step and we enforce these constraints as follows, for all c,  $1 \le c \le N$  and for all v,  $\sum_{\{i \in F_{k,v}\}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} P(i,v,f_{clk}) \le P_{peak}$ 





# Scheduling algorithm







# Scheduling algorithm ....

- Step 1: Find ASAP schedule of the UDFG.
- Step 2: Find ALAP schedule of the UDFG
- Step 3: Determine the mobility graphs for each node.
- Step 4: Modify the mobility graph for MVMC scheme.
- Step 5: Calculate operating frequency of a FU using delay model.
- Step 6: Construct the ILP formulations of the DFG
- Step 7: Solve the ILP formulations using LP-Solve.
- Step 8: Obtain the scheduled DFG.
- Step 9: Determine  $f_c$ ,  $f_{base}$  and  $cfi_c$  for MVDFC scheme.



**Scheduling for MVDFC** 





CSCE 6730: Advanced VLSI Systems



**Scheduling for MVMC** 



Example DFG (for RC1)



CSCE 6730: Advanced VLSI Systems



#### **Experimental results : benchmarks**

- 1. Example circuit (EXP) (8 nodes, 3\*, 3+, 9 edges)
- 2. FIR filter (11 nodes, 5\*, 4+, 19 edges)
- 3. IIR filter (11 nodes, 5\*, 4+, 19 edges)
- 4. HAL differential equation solver (13 nodes, 6\*, 2+, 2-, 1 <, 16 edges)</li>
- 5. Auto-Regressive filter (ARF) (15 nodes, 5\*, 8+, 19 edges)





#### **Experimental results : resource constraints**

| Multipliers |      | AL   | Serial No |     |
|-------------|------|------|-----------|-----|
| 3.3V        | 5.0V | 3.3V | 5.0V      |     |
| 2           | 1    | 1    | 1         | RC1 |
| 3           | 0    | 1    | 1         | RC2 |
| 2           | 0    | 0    | 2         | RC3 |
| 1           | 1    | 0    | 1         | RC4 |
| 2           | 0    | 0    | 1         | RC5 |





### **Experimental results : notations**

P<sub>S</sub> : the peak power consumption (in mW) for single supply voltage and single frequency operation (SVSF)

P<sub>DFC</sub> : the peak power consumption (in mW) for MVDFC operation

- $\mathsf{P}_{\mathsf{MC}}$  : the peak power consumption (in mW) for multiple supply voltages and multicycle operation
- PDP<sub>S</sub> : the power delay product (in nJ) for SVSF operation
- PDP<sub>DFC</sub> : the power delay product (in nJ) for MVDFC operation
- $PDP_{MC}$ : the power delay product (in nJ) for MVMC operation
- $\Delta P_{DFC} = (P_{S} P_{DFC}) / P_{S} * 100$ : % peak power reduction for MVDFC
- $\Delta P_{MC} = (P_{S} P_{MC}) / P_{S} * 100 : \%$  peak power reduction for MVMC
- $\Delta PDP_{DFC} = (PDP_{S} PDP_{DFC})/PDP_{S} * 100 : % PDP reduction for MVDFC$

 $\Delta PDP_{MC} = (PDP_{S} - PDP_{MC}) / PDP_{S} * 100 : % PDP reduction for MVMC$ 



30

#### **Experimental results : (% reduction)**

|             | RCs | $\Delta P_{\rm DFC}$ | $\Delta P_{\rm MC}$ | <b>∆PDP</b> <sub>DFC</sub> | $\Delta PDP_{MC}$ |
|-------------|-----|----------------------|---------------------|----------------------------|-------------------|
| Е           | 1   | 78                   | 55                  | 62                         | 16                |
| Χ           | 2   | 78                   | 35                  | 62                         | 41                |
| Р           | 3   | 78                   | 56                  | 63                         | 25                |
| F<br>I<br>R | 1   | 78                   | 49                  | 61                         | 8                 |
|             | 2   | 78                   | 35                  | 61                         | 12                |
|             | 3   | 78                   | 55                  | 61                         | 39                |
| I<br>I<br>R | 1   | 69                   | 38                  | 61                         | 4                 |
|             | 2   | 78                   | 56                  | 68                         | 34                |
|             | 3   | 78                   | 35                  | 66                         | 28                |
| H<br>A<br>L | 1   | 78                   | 29                  | 61                         | -35               |
|             | 2   | 78                   | 36                  | 61                         | 34                |
|             | 3   | 79                   | 56                  | 62                         | 22                |



CSCE 6730: Advanced VLSI Systems



#### **Percentage average reduction**





CSCE 6730: Advanced VLSI Systems



#### **Reductions using different schedulers**

| Benchmark<br>Circuits | MVDFC | MVMC | Shiue[12] | Martin[6] | Raghuna-<br>than[10] |
|-----------------------|-------|------|-----------|-----------|----------------------|
| (2) FIR               | 72    | 49   | 63        | 40        | 23                   |
| (4)HAL                | 75    | 41   | 28        | -         | -                    |
| (5)ARF                | 78    | 50   | _         | _         | _                    |





#### Conclusions

Reduction of peak power is essential.

This paper describes peak power reduction schemes at behavioral level through datapath scheduling.

The scheduling schemes use ILP based minimization for MVDFC and MVMC mode of circuit design.

□For both the modes the scheduler could achieve significant peak power reduction.

- □ For some resource constraints there is increase in PDP for MVMC mode design.
- The scheduling schemes are useful for data intensive applications.
- The applicability of the scheduling schemes for pipelining is to be investigated.
- The effect of switching activity is to be taken into account.
- The detail design of controller is to be done.
- The effect on clock network is to be studied.



