

# Journal of Integrated SCIENCE & TECHNOLOGY

### Early Register Transfer Level (RTL) power estimation in real-time System-on-Chips (SoCs)

A. Swetha Priya,<sup>1\*</sup> Kamatchi S,<sup>1</sup> E. Lakshmi Prasad<sup>2</sup>

<sup>1</sup>Department of Electronics & Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India. <sup>2</sup>Tessolve Semiconductors, Bengaluru, India.

Received on: 23-Sept-2022, Accepted and Published on: 31-Oct-2022

#### ABSTRACT

The present trend for low-power, smart-compact appliances for smart living has become a mandate that necessitate the demand for System-on-Chip (SoC) to embed more functionality on to a single chip. As feature size shrinks from 65nm down to 3nm, test power becomes a dominating parameter. It impacts many abstract hierarchical levels and takes several design cycles to analyze the issues very late in the



netlist due to bottleneck of Scan-DFT (Design-For-Testability). Increase in power causes the SoC to toss for a re-spin or re-design and leaves few portions of the circuitry with hotspots which becomes irreparable. Thus, to bridge up this design estimation gap, it is important to realize estimation of design-power tradeoffs at early Register Transfer Level (RTL) rather than at gate-level implementation. This design test power estimation gap has been identified in this paper by performing early power analysis with power estimator tool on 14nm & 10nm real-time SoC designs. Clock & power gating optimization techniques and power intent profiles of design has been used for RTL to netlist estimation & correlation. It is found that less than 5% correlation was observed from RTL to netlist at partition level. Estimation error is (-11.50% to -7.56%) in 14nm to 10nm SoC when compared to estimation error of (-0.7% to -4.4%) in 65nm to 45nm.

Keywords: Early Power analysis; Test power; Spyglass; RTL

#### **INTRODUCTION**

Advances in wireless mobile technology has brought evolution in process node technology to bring in efficient low-power hungry devices to market for a smarter lifestyle. The trend in smart technology increased power density in the chip with consequences of technology scaling with the increased transistor density to meet

\*Corresponding Author: Swetha Priya Email: a\_swetha@blr.amrita.edu

Cite as: J. Integr. Sci. Technol., 2023, 11(1), 454. URN:NBN:sciencein.jist.2023.v11.454

©Authors CC4-NC-ND, ScienceIN ISSN: 2321-4635 http://pubs.thesciencein.org/jist the demands of battery, low-active and stand-by power characteristics. Increasing size of SoC includes processors, various IPs, cores, memories, system interfaces, mixed signal blocks and complex packaging that made SoC test a big challenge with Power-Coverage-Performance (PCP)<sup>1</sup> paradigm to both design and test engineers. Designers are hit with various manufacturing related issues like DFT (Controllability & Observability), Design for debug (DFD) and Design for Yield (DFY) while targeting to achieve SoC with lower Defective Parts Per Million (DPPM) at reduced test time and test cost. The need to push for this ultimate performance of SoC with ever-higher degrees of integration and functionality for innovation is bringing in variations in power estimations at all the stages of design phases. It has become critical to predict the real power consumption accurately at design & verify stage of a chip. Increased power at SoC level has few problems: (1) adequate power supply for chip operation, (2) Number of switching transistors placed in dense IC (3) unrepairable on-chip hotspots or hot regions due to power/voltage surge. It causes power-supply integrity issues impacting many abstract levels of hierarchy from logic synthesis to Structural DFT to Automatic Test Pattern Generation (ATPG) to Physical design to packaging to power grids to tester power supply requirement. Manifestation of power issues can occur during test and also during functional operation of the Chip. It is always identified that switching structural power is more than the functional power which has several reasons as stated below:

(i) Test patterns cause a very high percentage of logic being switched at a given time resulting in stress during testmode.<sup>2</sup> This causes chip melt or chip stress.

(ii) Test efficiency relies on correlation with toggle rate. Therefore, switching activity of nodes are of few orders more than normal operational testmode.

(iii) Taking advantage of test time reduction, usage of testing SoCs in parallel mode causes more dissipation of power.

(iv) DFT logic circuitry employed is active in structural mode and idle functionally causing more switching power.

(v) Correlation of consecutive test patterns being low causes toggling rate and power density rate to increase.

(vi) Performing dynamic at-speed testing causes IR drop while stuck-at static test can be carried with least effect. Path delay fault testing is more difficult.

(vii) Circuit density and wire density causes changes in electromigration. It requires new defects and fault models.

(viii) On-chip BIST support consumes high power for testing.

(ix) Elevated temperature and current density decrease circuit reliability and increases power dissipation which needs expensive packages to be heat resistant.

Traditionally, power estimation is restricted at Gate-Level (GL) which is measured by switching activity of Design Under Test (DUT). This approach has several problems:

(i) Most of the Electronic Design Automation (EDA) tools rely upon addressing structural changes of DFT and to obtain target coverage possible via ATPG.

(ii) Clock gating was switched off to increase internal node observability during testing, and DVFS (Dynamic Voltage-Frequency Scaling) was physically disabled by avoiding PLL or System Clock.

(iii) Testing happens very late in the design cycle which gives no flexibility to correct the problem in design though discrepancy on silicon is within 5%. A better inspect is to evaluate dynamic power at RTL with 15% deviation from silicon which provides flexibility to change design.<sup>3</sup>

(iv)Testbench vectors can't capture a good real-time representation of accurate power estimation with workloads, performance, power benchmarks. Accurate numbers are obtained during the estimation of last step since every gate and wire is considered by synthesis tool. But dynamic power is noticed to increase very late in the design which routes back to RTL. Netlist simulations are time consuming which increases the design cycles and finally routes back to RTL to address issues to check if specifications are met. Usage of functional power optimization techniques are widely encouraged to be employed at all stages for total power reduction in chip.

This paper identified the gap between design and test and aims to decrease the gap between RTL and netlist SoC power estimation. Power optimization techniques are used to estimate good correlation of RTL to Gate Level Simulation (GLS). Section II describes the power components and power challenges in 10/14nm SoC. Section III shows the techniques for functional power reduction for both dynamic and static power. Section IV shows the RTL management techniques used for 10nm& 14nm SoC. Experimental results are present in Section V followed with the conclusion of the work in section VI. T-test analysis on usage of Spyglass Design Constraints (SGDC) tool displayed in Appendix.

#### **Power Analysis**

Power estimation can be performed at different design phases as shown in Figure 1 (a). As silicon complexity changes, number of design parameters increases as shown in Figure 1 (b).



Figure 1: (a) Design Phases & (b) Silicon complexity at various nodes

#### **POWER COMPONENTS:**

Summation of two components namely static and dynamic causes the total power components of CMOS circuit. Current flowing from the four main sources, which are listed in Figure 2, leads to the system energy element.

*Reverse-biased-Junction leakage:* It is leakage current from direct Band-to-Band Tunneling (BTBT) of electron-hole pair carrier diffusion or drift in reverse biased depletion region.

*Sub-threshold Leakage:* It is leakage current is due to cut-off transistor in weak inversion region with majority carrier drift from drain to the source.

*Gate Induced Drain Leakage:* leakage current caused by strong field emission from drain to substrate as a result of MOS transistor surface field crowding in the depletion layer narrowing. *Gate Leakage:* It is tunneling effect of electrons with lower oxide thickness between gate and substrate.

Leakage current is described by Ileakage = Is (eqV/kT -1), where Is= reverse saturation current, V = voltage, k = Boltzmann's constant and T = temperature. Static power dissipation is given by Pstat =  $\sum$ Istat(i) .VDD, where i = 1, 2, 3,...,n, Istat = static current and VDD = supply voltage.



Figure 2: Different leakage currents of deep-submicron transistor<sup>2</sup>



**Figure 3:** CMOS charging and discharging of CL (a) rise transition (b) fall transition (c) output voltage (d) supply current

## Dynamic dissipation is the result of switching activity:

Load capacitor Charging and discharging mechanism: When CMOS inverter input is switched to logic 0 during low-tohigh transition (Figure 3 (a)), PMOS switches to ON and NMOS switches to OFF to ensure DC path resistivity from source to inverter outcomes and CL charges from 0 to VDD. During high-tolow transition (Fig3. (b)), NMOS turns ON and PMOS turns OFF to establish DC resistive path from inverter to ground rail with CL discharging and NMOS dissipating energy. Switching power is given by

Pswitching = CL. VDD2. fclk.  $\alpha$ , where Pswitching = capacitive-load power consumption, VDD = supply voltage, fclk= output clock frequency,  $\alpha$  = activity factor, CL = external load capacitance.

**Short-circuit current:** This is the current that flows while pchannel transistor switches from one logic to other state due to the supply voltage to ground and n-channel transistor is turned on at the same time. Short-circuit power is given by

Pshort-circuit = VDD . tsc . Ipeak . fclk ,

Where Pshort-circuit = short circuit power consumption, VDD = supply voltage, tsc= time interval of short-circuit current, Ipeak = total switching internal current and fclk= input clock frequency. Total power consumption is given by

Pavg = Pswitching+Pshort-circuit+Pleakage, Pavg =

(CL.VDD2.fclk.α)+(VDD.tsc.Ipeak.fclk)+(Ileakage.VDD)

where VDD = supply voltage, fclk = clock frequency,  $\alpha$  = activity factor, CL = external load capacitance, tsc = time duration of length of the short circuit current, Ipeak = whole switching internal current and Ileakage = leakage current

#### POWER CHALLENGES:

According to ITRS<sup>4</sup> as shown in Fig 4(a), Gate length has been reduced as per Moore's law with 0.7 scaling factor on 90nm to 65nm to 45nm to 32nm or 22nm. As the process node scales down on transistor size and operating voltage, power scaling has kept a reversal with the size of transistor scaling with the phenomenon of increasing overall power consumption of chip as shown in Figure 4(b).



**Figure 4:** (a) Technology vs Gate Length<sup>5</sup> and (b) Power consumption trends of 80mm2 soc from IRDS 2020<sup>4</sup>

The power profiling of the chip switches from leakage dominant at 22nm to dynamic-dominant power at 12/14nm and amalgamation of both dominants going below 10nm. From 90nm to 22nm, focus had been on leakage power and with outbreak of 16/14nm, dynamic power was significant part of total power due to increase in gate capacitance. Dynamic power is the bottleneck for many new step-in designs. The trend shows that the actual dynamic power is exceeding by few folds of magnitude to the estimated power for past several years.<sup>4,5</sup> Mainly, this problem is elevated between estimated pre-silicon stage of a SoC design and actual power dissipated by manufactured SoC.

Especially,14nm and 10nm SoCs are called as dark silicon for the fact that billions of devices are crammed for functionality over a very small area. Density of dynamic-current elevates as transistors are physically packed together and current leakage increases. Both these cases make dark silicon to have hot spots causing thermal and metal migration. Also, if all the devices are active, then it exceeds the thermal and power budget causing device runaway leading to shut-down. Current density increases and needs on-chip power

b

management capability to boost design cost and time-to market. It forces the teams to have dynamic voltage, scaled frequency, multipower domains, clock gating, multiple voltage threshold, usage of Common or Unified Power Format (CPF or UPF) and LP (low-Power) scan design, Built-In-Self-Test (BIST) based structures and power-aware clock tree synthesis with smart clock scheduling & planning and power optimization. Even then, power issues are exacerbated in number of corners, modes and power scenarios that could conflict power, timing, system integration, manufacturability and area closure. Rate of hard-to-detect defects and new class of defects like double patterning, voltage scaling, random dopant fluctuation increases at these geometries and process variations which demands for newer fault models, methodology and techniques.

#### **TECHNIQUES ON FUNCTIONAL POWER REDUCTION**

Power reduction techniques are classified into (1) Dynamic Power Reduction (DPR) and (2) Leakage Power Reduction (LPR). DPR systems:

Dynamic power increases chips power density and average power due to faster clock and increase in device integration. Some of the common techniques<sup>6</sup> in literature are:

**1)** Optimization of circuit: The latency of the logic circuits is restricted, and various supply or threshold levels are assigned statically.

**2) Operand isolation:** Mostly, designs are kept in idle modes but switching activity of modules redundantly computes circuit input operations which cause power consumption. In order to prevent repetitive processing with the least amount of leakage current, isolation circuitry causes inputs of inactive phases.

**3)** Dynamic Voltage-Frequency Scaling (DVFS): At structure layer, power reduction under lower load conditions is made effective with scaled down voltage and operating frequency. *LPR TECHNIOUES:* 

The sub-100nm design with logic and memory circuits causes leakage power. Some of the common techniques in literature:

**1) Input Vector Control (IVC):** Applying the best possible input vector reduces the leakage current. The variety of primary inputs that are included in the design affects how much total leakage current is generated. Increasing gate-to-source, drainage, or body potential results in a rise in gate tunneling current, or stacking effect. Advantage: proper selection of input vector results in standby leakage power saving by 30-50%.<sup>7</sup> Effective for sub-threshold leakage current reduction.

**2)** Dual-Vth Design: In non-critical path, higher threshold voltage transistors are used while lower threshold-transistors are used on critical paths for maximum performance and minimal energy consumption (refer Figure 5). In Dual-Vth CMOS, leakage power can be reduced both during standby and active modes by assigning high Vth without delay or area overhead. Reduction of background-leakage is advantageous for IDDQ testing.

*3) Supply gating:* A stacking transistor is used to gate Supply VSS as shown in Figure 6 (a) to save leakage power dueint inactive modes of devices. As demonstrated in Figure 6 (b), a different method known as Multi-Threshold CMOS (MTCMOS) maximizes leakage savings by combining a high-Vt gating transistor with a

low-Vt core. However, coupling noise<sup>8</sup> and worst-case latency overhead are present because of the gating transistor.



**Figure 5:** (a) dual-threshold voltage CMOS circuit (b) path distribution for dual and single -Vth CMOS<sup>2</sup>

**4)** *Memory leakage control:* Leakage from memory cell embedded provides static power and a few common techniques like source biasing, supply gating techniques can be used to ensure data retention.<sup>9</sup>



**Figure 6:** (a) Supply gating for leakage reduction (b) Multi-threshold-CMOS (MT-CMOS) design

| Some common techniques are summarized in the                         | Table | 1: |
|----------------------------------------------------------------------|-------|----|
| <b>Table 1:</b> Commonly Used Low Power Techniques <sup>2,6-10</sup> |       |    |

| DRP techniques                              | LPR techniques                                      |  |  |  |
|---------------------------------------------|-----------------------------------------------------|--|--|--|
| Clock Gating                                | Power gating with multi-<br>threshold CMOS (MTCMOS) |  |  |  |
| Operand isolation                           | Dual- threshold CMOS (DTMOS)                        |  |  |  |
| Gate sizing                                 | Variable Threshold CMOS<br>(VTMOS)                  |  |  |  |
| Razor approach                              | Super cut off CMOS(SCCMOS)                          |  |  |  |
| Dynamic Voltage-Frequency<br>scaling (DVFS) | Transistor stacking                                 |  |  |  |
| Dynamic voltage Supply (DVS)                | Sleepy stack, Sleepy keeper                         |  |  |  |
| CRISTA approach (critical path)             | Input Vector Control (IVC)                          |  |  |  |
| Computational kernels                       | Lector - Leakage control transistor                 |  |  |  |

#### **RTL POWER MANAGEMENT**

In this paper, clock gating dynamic power reduction technique, power gating & partitioning schemes with CPF power intent has been explored for active power management.

Power Domain Partitioning: Power-Gating is one of the potential LP techniques that switches off power to portions of not in use circuitry either clock line or data path block. This technique reduces the power by 96% of leakage current.<sup>11</sup> To achieve SOCs various power domains, it is typically necessary to combine three circuit type structures: a) isolation cells, b) retention logic, and c) power switches. In order to use multi-power domain structure functionally and not go over the circuit's operational parameters, certain design guidelines must be adhered to. Every energy domain is operated in either a powered-up or powered-down state by the functional mode known as "Power mode".6 Numerous functional sections that can individually gate the power supply by operating power switches are implemented in the Power domain<sup>7</sup> of a gadget. Depending on the operating circumstances of various supply voltage levels linked thru the level shifters, they can be operated with a variety of supply voltages. High-voltage PMOS/NMOS transistors known as Power switches<sup>8</sup> are typically used to gate the power distribution link to VDD/GND. Header in Figure 7 (a) uses a high Vt PMOS transistor to control VDD and can make Vt available to active core or shut off, while footer in Figure 7(b) uses NMOS to control VSS. Header is more commonly used than footer in power-gating design currently due to switch-area-body bias and system level design efficiency. Multiple power switches are arranged in a daisy chain in Figure 7(c), which can prevent energy surges due to multiple switching. This can be stopped by explicitly placing buffers or by interconnect delays. Isolation logic9 connected between two power domains at the boundary will isolate power down domain from power up domain and also prevents floating signals to be x in simulations. According to Figure 7(d), an isolating cell is one that uses an isolation enable signal Enable to safely clamp the output OUT of the ON/OFF power domain to a known value. Level Shifters<sup>10</sup> is classically modelled as a buffer for test generation resolve to produce tests between two power domains with a powered-up mode on both domains. State retention cell<sup>13</sup> is a technique that allows for quick wake-up and wake-down by maintaining the condition of some or all memory elements in a power domain. A data retention flip-flop that supports 2 additional actions (sleep and restore) in accordance with standard functional as well as scanning activities is shown in Figure 7(e). The switching devices can be used to turn off the master latch while sleep, while the slave latch keeps data safe by using an un-switched power source. Recover makes sure that the master latch can accept the saved data once more. As a result, a series of isolating, state preservation, and power-shutoff is needed for power-down cycles, whereas a reversed process is employed for power-up cycles.

**Power Control Logic:** Power management unit (PMU) controls transitions between the power modes (Figure 8). Any defect in this circuitry will affect power consumption and does not affect design operation functionally.<sup>14</sup> To facilitate the testing in PMU, to allow scan testing while power gating is active, DFT adjustments are required. It controls various power domains like power enabling,

isolation and enabling retention. According to the energy intention standards, the power control utilized for various test modes should correspond to a legitimate operational performance mode.



**Figure 7:** Power gating using (a) header switch, (b) footer switch, and (c) daisy chain of header switches (d) isolation cell in multiple power domain (e) retention flip-flop and (f) clock gating structure



Figure 8: Functional Power management circuitry<sup>2</sup>

Power specification formats: CPF & UPF are IEEE P1801<sup>11</sup> standardized formats to enable functional verification of poweraware behavior of the chip to bridge the gap between simulation of power control signals and power structures in the design. The formats contain the following specifications of low-power saving techniques in Tool Control Language (TCL) with commands to specify: (1) Power mode i.e., (i) the list of control methods for their power switching logic and energy domains, (ii) the list of power modes with definition of operations, supply voltages and mode transition expressions for every energy domain below every energy mode, (2) Power logic i.e., (i) isolation logic and/or level shifters used to join energy domains, (ii) the number of memory cells present within every power domain, and (iii) the procedure to save (recover) the state of the storage cells when the energy domain is turned off (up). (3) Power domains i.e., (i) it needs timing lib for timing analysis of different power domains at logical (domain hierarchical modules), physical (power pins & connectivity) and analysis view (timing library sets for power domains). Set of standard commands can be used for verification, analysis and implementation of design as shown in Table 2.

*Clock gating (CG):* Clock tree may consume 45% of system power.<sup>11</sup> It is a popular power reduction technique to reduce power in clock line for inactive blocks. Capacitive charging and discharging is prevented and clock buffer switching in gated logic

|--|

| UPF                                                                                                                                                                                                                | CPF                                                                                                                                                                                                                                                               |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Create_supply_port<br>VDD_1V8 -direction in                                                                                                                                                                        | Set_design <module></module>                                                                                                                                                                                                                                      |
| Create_supply_net<br>VSS_NET_1V8 -domain<br>OPCG_DOM1 -resolve<br>parallel                                                                                                                                         | Create_power_domain -name<br><power_domain> -instances<br/><instance_list> -default</instance_list></power_domain>                                                                                                                                                |
| Connect_supply_net<br>VSS_CON_1V8 -domain<br>OPCG_DOM1                                                                                                                                                             | Create_power_domain-name <power_domain>-instances<list_instances>-boundary_ports<pin_list>-shutoff_condition<expression></expression></pin_list></list_instances></power_domain>                                                                                  |
| get_supply_net<br>VSS_NET_1V8                                                                                                                                                                                      | Create_power_nets -nets <list_nets></list_nets>                                                                                                                                                                                                                   |
| add_domain_element<br>OPCG_DOM1_E1                                                                                                                                                                                 | Create_ground_nets -nets <list_nets></list_nets>                                                                                                                                                                                                                  |
| Create_power_domain<br>OPCG_DOM2 -<br>include_scope                                                                                                                                                                | Create_isolation_rule -name <string> -<br/>isolation_condition <expression> -<br/>no_condition -isolation_target<br/>{from to} - isolation_output<br/>{high low hold tristate}</expression></string>                                                              |
| Create_power_switch<br>PSW_RAM -domain<br>OPCG_DOM1 -<br>output_supply_port<br>{Vout VSS_1V8} -<br>input_supply_port<br>{Vin_port<br>Vdd_OPCGPWR_1V8} -<br>control_port {CTRL<br>POR} -on_state {ON Vin<br>{CTRL}} | Create_state_retention_rule -name<br><string> -domain <power_domain> -<br/>instances <list _instances=""> -<br/>restore_edge <expression> -save_edge<br/><expression> -target_type<br/>{flop latch both}</expression></expression></list></power_domain></string> |
| Merge_power_domains -<br>domain PD1_CPU -<br>domain PD2_CPU                                                                                                                                                        | <pre>create_level_shifter_rule -name <string> -to {list_domains}</string></pre>                                                                                                                                                                                   |
| Set_domain_supply_net<br>VDD_OPCG_DOMAIN1<br>-primary_power_net<br>VDD_OPCG_1V8 -<br>primary_ground_net<br>VSS_OPCGD1                                                                                              | Create_nominal_condition -name<br><string> -voltage <integer_value></integer_value></string>                                                                                                                                                                      |
|                                                                                                                                                                                                                    | Create_power_mode -name <string> -<br/>domain_conditions<br/>{domain_conditions}</string>                                                                                                                                                                         |
|                                                                                                                                                                                                                    | end_design <module></module>                                                                                                                                                                                                                                      |

cone and thus saves the power. It is used for synchronous circuits to disable clocks in functional mode, thus no fault coverage impacted and test power reduction made possible by test clock planning ie., locating where clocks are gated, determining clock gating logic and enabling it, identifying default values and dynamically augmenting a test. There are types of CG techniques. This is compared based on glitch impact, delay, toggling activity, change in period of sleep, performance, area and power.<sup>12</sup> Found that Flipflop-based CG (Figure 9(c)) exhibits high switching

activity, low performance and high-power consumption. Glitches, area overhead & high switching activity is seen in gate-based technique (Figure 9(a)). It causes dynamic power consumption to increase. Latch-based CG (Figure 9(b)) exhibits low switching activity, glitch-free and good performance with less power consumption than the other techniques. However, it has long sleeping period and delay mismatches. So, latch-based is preferred for glitch-free, high performance and for less area constraint. Datadriven CG (Figure 9(d)) reduces redundancy in clock pulses with clock switching power but harder to implement. Synthesis based CG reduces timing constraint with easy circuit implementation but redundancy, high power consumption and switching activity are problems that exists. Auto-gated CG (Figure 9(e)) reduces redundancy problem with ease of implementation, but high switching activity and timing constraints are problems that exists. Look-ahead-based CG (Figure 9(f)) is easy to implement with less timing constraints and no redundancy in design. Experimental results of work is briefed out in the next section.



**Figure 9:** (a) Gated-based CG (b) Latch type CG (c) Flipflop-type CG (d) Driven data type CG (e) augogated CG (d) Lookahead type CG

#### **RTL EXPERIMENTAL RESULTS**

In this study, we provide the power matrix based on the correlation of RTL to GLS and the flow of RTL power estimation on two industrial real-based layouts at 14nm and 10nm. The design is synthesized in Synopsys Design Complier (DC) and layout is done in IC complier, Spyglass power is used for RTL power estimation and Primetime PX is used for netlist analysis. Power estimation with and without tool per partition basis is given in Table III. Design flow used ensures the proper working of the design as summarized in Table IV and Figure 10.



Figure 10: RTL Design flow



Figure 11: (a) Spyglass RTLsignoff flow and (b) power Estimation flow

|       | 14 nm (uw) |           | 10 nm (uw) |           |
|-------|------------|-----------|------------|-----------|
| Par   | Without    | with SGDC | Without    | with SGDC |
| Name  | SGDC tool  | tool      | SGDC tool  | tool      |
| par1  | 1984.02    | 1257.49   | 2384.13    | 1345.3    |
| par2  | 1705.98    | 1298.25   | 2101.67    | 1265.4    |
| par3  | 1943.48    | 1225.67   | 2293.13    | 1474.5    |
| par4  | 1984.87    | 1320.23   | 2493.61    | 1598.7    |
| par5  | 2119.38    | 1468.34   | 2617.6     | 1399.9    |
| par6  | 1917.48    | 1377.11   | 2417.35    | 1283.1    |
| par7  | 1925.03    | 1160.12   | 2125.03    | 1205.1    |
| par8  | 1931.28    | 1328.28   | 2131.93    | 1299.1    |
| par9  | 2103.43    | 1445.18   | 2503.18    | 1351.4    |
| par10 | 1984.12    | 1256.43   | 2304.13    | 1275.2    |
| par11 | 1784.29    | 1357.12   | 2002.03    | 1176.     |
| par12 | 2004.78    | 1504.17   | 2387.12    | 1243.2    |
| par13 | 1969.67    | 1357.14   | 2398.19    | 1218.3    |
| par14 | 1997.98    | 1446.73   | 2276.18    | 1382.9    |
| par15 | 1533.45    | 951.09    | 1996.36    | 1233.0    |
| par16 | 1996.34    | 1357.22   | 2010.04    | 1092.1    |
| par17 | 2132.76    | 1435.35   | 2333.45    | 1230.1    |
| par18 | 1923.87    | 1301.72   | 2437.89    | 1338.8    |
| par19 | 1954.65    | 1257.23   | 2334.12    | 1343.1    |
| par20 | 2132.56    | 1304.34   | 2400.56    | 1433.8    |
| par21 | 1990.03    | 1457.87   | 2341.99    | 1364.9    |
| par22 | 1986.97    | 1367.19   | 2314.34    | 1455.8    |
| par23 | 1704.59    | 1357.52   | 2004.18    | 1234.1    |
| par24 | 2000.09    | 1477.51   | 2398.33    | 1305.0    |
| par25 | 2124.97    | 1320.13   | 2234.34    | 1344.1    |
| par26 | 2016.74    | 1357.11   | 2230.96    | 1234.1    |
| par27 | 2002.51    | 1493.65   | 2124.45    | 1233.3    |
| par28 | 2038.63    | 1357.46   | 2313.51    | 1334.6    |
| par29 | 1962.67    | 1299.768  | 2313.07    | 1525.6    |
| par30 | 2079.53    | 1290.457  | 2323.14    | 1349.3    |
| par31 | 2054.69    | 1237.26   | 2414.38    | 1547.3    |
| par32 | 2240.43    | 1294.527  | 2541.33    | 1573.9    |
| par33 | 2397.39    | 1457.8    | 2556.78    | 1443.3    |

| Table 3: Power | Matrix    | Per | Partition | in | 14nm an     | d 10nm | SoC |
|----------------|-----------|-----|-----------|----|-------------|--------|-----|
|                | 111111111 |     | i untiton |    | 1 IIIII ull | a romm |     |

Spyglass suite<sup>15</sup> (Figure 11(a)) has been used at RTL to carry out the Power-Performance-Area (PPA) analysis. Inputs of power estimator tool is given in Figure 11 (b). For power analysis, a nominal PVT corner was chosen to study the estimation & correlation of dark silicon chips via event-based switching activity methodology.<sup>16,17,19</sup> T-test statistical analysis is used to study the correlation of RTL and netlist for different SoC nodes. Salient observations:

1) From Table III, power components are rising for 10nm when compared to 14nm. Correlation of power components shows a significant 33%-42% reduction when we use SGDC tool. Power components are as shown in Figure 12. Table V shows the various power components in 14nm and 10nm SoC. Figure 13 shows that dynamic and leakage power is proportionally increasing as we move towards lower nodes.

2) RTL to GLS correlation varies based on the power management strategies used for design. Table VI reveals that estimation of power dissipation varies by ~5% under different PVT conditions. It is higher at 10nm GLS compared to 10nm RTL which is comparatively less in 14nm. Figure 14 shows the correlation of power estimates at various PVT corners of SoC.

| I able 4: Design Steps For Power Management        |                                                                                                                                                                                                                                                                           |                                                                                                                                                                                              |  |  |  |  |  |
|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| Step                                               | Requirement                                                                                                                                                                                                                                                               | Implementation                                                                                                                                                                               |  |  |  |  |  |
| Power<br>management<br>architecture                | Power reduction and<br>planning:<br>-Provides intelligent power<br>reduction and domain<br>planning at RTL<br>-It recommends new clock<br>enables which helps in power<br>reduction of the circuitry.<br>-Provides constraint<br>generation for power-aware<br>synthesis. | Power gating &<br>clock gating along<br>with power<br>domain<br>partitioning has<br>been planned to<br>control various<br>control & logic for<br>power domains<br>and their<br>interactions. |  |  |  |  |  |
| Power intent<br>of design                          | This involves design operation<br>with power views and its<br>dynamic interactions on other<br>domains to achieve system<br>functionality.                                                                                                                                | CPF used with<br>advanced power<br>intent to bind it<br>with targeted<br>process<br>technology.                                                                                              |  |  |  |  |  |
| Power-<br>domain<br>sequencing<br>verification     | -Does formal verification on<br>power domain sequencing.<br>-Formally prove power<br>up/down sequencing.                                                                                                                                                                  | Domain<br>partitioning for<br>each domain and<br>to ensure it with<br>proper domain<br>isolation.                                                                                            |  |  |  |  |  |
| Power<br>verification<br>and<br>implementati<br>on | Voltage and power domain<br>verification:<br>-performs voltage and power<br>domain verification<br>-Auto-fix & Verify: Performs<br>auto insertion of level<br>shifters, retention cells,<br>isolation logic, power<br>switches and multimode                              | Its power control<br>logic verification<br>that drives<br>control inputs to<br>power<br>management<br>architecture                                                                           |  |  |  |  |  |

1367.52

1301.09

1345.81

2487.09

2213.45

2012.58

1526.78

1456.98

1335.56

2234.57

2102.16

1884.34

par34

par35

par36

|                     | analysis which can be done on RTL, gates, layout.                                                                                                                     |                                                                                                                  |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| Power<br>estimation | -Timing-aware power<br>estimation at RTL, gates,<br>layout<br>-At RTL, it provides 20%<br>accuracy to silicon and at<br>gate, accuracy is 10%.<br>-Domain-aware power | It is performed on<br>set of power<br>scenarios and<br>corners to obtain<br>a power profile of<br>actual design. |
|                     | estimation: Layout checks for connectivity and domains.                                                                                                               |                                                                                                                  |



Figure 12: Test Power for 10nm and 14nm

| Without<br>SGDC<br>tool | Logic<br>Power<br>(uW) | Glitch<br>Power<br>(uW) | Dynamic<br>Power<br>(uW) | Short<br>Circuit<br>power<br>(uW) | Leakage<br>Power<br>(uW) |
|-------------------------|------------------------|-------------------------|--------------------------|-----------------------------------|--------------------------|
| 14nm                    | 533.8                  | 659.6                   | 1193.44                  | 230.81                            | 0.15                     |
| 10nm                    | 749.53                 | 879.4                   | 1398.87                  | 482.57                            | 0.31                     |
| With<br>SGDC<br>tool    | Logic<br>Power         | Glitch<br>Power         | Dynamic<br>Power         | Short<br>Circuit<br>power         | Leakage<br>Power         |
| 14nm                    | 284.17                 | 330.5                   | 698.65                   | 129.92                            | 0.08                     |
| 10nm                    | 427.11                 | 591.9                   | 869.97                   | 273.26                            | 0.15                     |

From Table VII, estimation error is (-11.50% to 7.56%) which is far more when we compare to estimation error from 65nm to 45nm (-0.7% to -4.4%) as seen in paper.<sup>18</sup> T-Test analysis has been conducted on partitions of the chip with these scenarios: (i)14nm Without SGDC tool Vs with SGDC tool, (ii) 10nm without SGDC tool Vs with SGDC tool, (iii) with SGDC tool and without SGDC tool - 14nm Vs 10nm and found that in all scenarios power estimation with tool proves to be more than the without tool. T-test reveals that there is a significant (refer Table VIII, Table IX, Table X, Table XI of Appendix) reduction using tool compared to its nonusage.



With out SGDC tool 10NM With SGDC tool 10NM Kith out SGDC tool 14NM With SGDC tool 14NM

Figure 13. Power components for 10nm and 14nm

| Tabla 6. I |         |        | 1/nm &   | - 10nm  | Under | Various  | DVT |
|------------|---------|--------|----------|---------|-------|----------|-----|
| Table 0: 1 | XIL VSC | LS% UI | 14IIII a | L TOHIH | Under | v arrous | PVI |

| SoC      | PVT1 | PVT2 | PVT3 | PVT4 |
|----------|------|------|------|------|
| RTL 14nm | 2.45 | 4.13 | 1.98 | 3.25 |
| GL 14nm  | 2.77 | 4.37 | 2.09 | 3.56 |
| RTL 10nm | 2.69 | 4.48 | 2.12 | 2.99 |
| GL 10nm  | 2.91 | 4.57 | 2.23 | 3.3  |



Figure 14: Correlation of Power estimates for various PVT corners

Table 7: GL & RTL Estimated Test Power Correlation

| Design        | RTL est. (uW) | GL est. (uW) | Est. diff. (%) |
|---------------|---------------|--------------|----------------|
| SOC<br>(14nm) | 2.45          | 2.77         | 11.50%         |
| SOC<br>(10nm) | 2.69          | 2.91         | 7.56%          |

#### **CONCLUSION**

Power is the limiting factor for any given design. So, in this paper, power modelling, analysis, estimation and correlation of RTL to GLS has been explored on real-based designs. Functional effective power optimization techniques, design flow and RTL to GLS correlation is summarized to give roadmap to designers and researches for their new step-in designs and also to orient them to explore more on techniques. Clock gating conditions for which power is maximum is specified for users to ease their efforts. From T-test conducted, we found that Spyglass power estimation gives promising results and helpful in bridging gaps between RTL and GLS. Early RTL power estimation has to become mandate step that help designer and test engineers as it is easier and faster to root out the issues at early stage. RTL based power analysis is helpful to measure dynamic power. It does not demand for netlist and enables effective design-power tradeoffs early in design cycle. Therefore, we recommend RTL-level power analysis as a mandate step at early stage of RTL to reduce the power by correct-by-construction method. Estimation error is (-11.50% to 7.56%) for 14nm and 10nm which is in reliable limits when we compare to estimation error from 65nm to 45nm (-0.7% to -4.4%) as seen in paper.<sup>18</sup> This helps a design of similar technology to calibrate data for high fidelity and accurate power estimations at RTL before a netlist is ready.

#### REFERENCES

- S. Priya and S. Kamatchi. Power Optimization of VLSI Scan under Test using X-Filling Technique. 2021 Emerging Trends in Industry 4.0 (IEEE -ETI 4.0) 2021, 1-9.
- D. Gizopoulos, K. Roy, P. Girard, N. Nicolici and X. Wen. Power-Aware Testing and Test Strategies for Low Power Devices. *Design, Automation* and Test in Europe 2008.
- 3. Lauro Rizzatti. Emulation-Centric Power Analysis of SoC Designs. Semiconductor Engineering, 2022
- The International Technology Roadmap for Semiconductors (ITRS) Roadmap, 2014/2015, http://public.itrs.net/ Power consumption trend of 80mm2 SoC, IRDS 2020, 16.
- S. Bhatia, N.K. Jha. Integration of hierarchical test generation with behavioral synthesis of controller & data path circuits. *IEEE transactions* on very large scale integration (VLSI) systems, **1998**, 6(4), 608-619.
- C. Kumar, F. Maamari, K. Vittal, W. Pradeep, R. Tiwari, S. Ravi. Methodology for early rtl testability and coverage analysis and its application to industrial designs. In 2014 IEEE 23rd Asian Test Symposium, 2014, pp. 125-130.
- Ghosh et al. A design for testability technique for RTL circuits using control/data flow extraction. In *Proceedings of International Conference on Computer Aided Design*, pp. 329-336, **1996**.
- S. Priya. Defect-aware methodology for low-power scan-based VLSI testing. IEEE Conference on Power, Control, Communication & Computational Technologies for Sustainable Growth 2015, 234-238.
- A. Kumari, V. Pandey. Non-Dynamic Power Reduction Techniques for Digital VLSI Circuits: Classification and Review. 2020 International Conference on ComPE 2020, 579-583.
- M. S. Lakshmi, P. Vaya and S. Venkataramanan. 2011. Power management in SoC using CPF. 2011 3rd International Conference on Electronics Computer Technology 2011, 325-329.
- T. Chindhu S. and N. Shanmugasundaram. Clock Gating Techniques: An Overview. 2018 Conference on ICEDSS 2018, 217-221.
- M Maryan et al. A New Circuit-Level Technique for Leakage and Short-Circuit Power Reduction of Static Logic Gates in 22-nm CMOS Technology. Circuits Syst Signal Process 2021, 40.
- Maryan, et al. A circuit-level methodology for leakage power reduction of high-efficient compressors in 22-nm CMOS technology. Analog Integr Circ Sig Process 2022, 110.
- 14. SpyGlass Power Methodology GuideWare 2.0 User Guide, Atrenta.
- A.Yadav. study and Analysis of RTL verification tool. 2020 IEEE Students Conference on Engineering & Systems (SCES) 2020.
- H. V. Ranjitha, S. Hiremath and S. G. Langadi. RTL Power Estimation: Early Design Cycle Approach for SoC Power Sign-Off. 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) 2018, 480-484.
- 17. Abhay Singh,et al. Methodology for early and accurate test power estimation at RTL. IEEE International Test Conference **2010**.
- N. Vyagrheswarudu, S. Das and A. Ranjan. PowerAdviser: An RTL power platform for interactive sequential optimizations. Design, Automation & Test in Europe Conference 2012, 550-553.

#### **APPENDIX**

Table 8. T-test for 14nm Without SDCG tool Vs With SDGC tool

| t-Test (Assuming Equal Variances)          |          |                          |  |
|--------------------------------------------|----------|--------------------------|--|
| Unpaired Comparison fo                     | or Means |                          |  |
|                                            | Group 1  | Group 2                  |  |
| Mean                                       | 1995.826 | 1338.663389              |  |
| S.E.M.                                     | 25.97721 | 17.61183661              |  |
| S.D.                                       | 155.8633 | 105.6710197              |  |
| Variance                                   | 24293.36 | 11166.3644               |  |
| Sum                                        | 71849.73 | 48191.882                |  |
| Ν                                          | 36       | 36                       |  |
| Sum(x^2)                                   | 1.44E+08 | 64903530.83              |  |
| Sum(x)^2/N                                 | 1.43E+08 | 64512708.08              |  |
| Correction Factor                          | 2E+08    |                          |  |
| Df                                         | 70       |                          |  |
| Expected Difference                        | 0        |                          |  |
| Common Variance                            | 17729.86 |                          |  |
| t(cal)                                     | 20.93902 | *** (P<=0.001) Two-sided |  |
| P(t<=t(cal)) Two-sided                     | 7.40E-32 |                          |  |
| t(0.05) Two-sided                          | 1.994437 |                          |  |
| Lower Conf. Limit of<br>Difference         | 594.5679 |                          |  |
| Upper Conf. Limit of<br>Difference         | 719.757  |                          |  |
| F-Test for Equal<br>Variances              |          |                          |  |
| F(cal)                                     | 2.175583 | * (P<=0.05)              |  |
| P(F<=F(cal))                               | 0.012057 |                          |  |
| F(0.15)                                    | 1.424469 |                          |  |
| t-Test for Unequal Variances (Aspin-Welch) |          |                          |  |
| Var1/N1+Var2/N2                            | 984.9923 |                          |  |
| С                                          | 0.685097 |                          |  |
| Df                                         | 61.56314 |                          |  |
| t(cal)                                     | 20.93902 | *** (P<=0.001) Two-sided |  |
| P(t<=t(cal)) Two-sided                     | 1.08E-29 |                          |  |
| t(0.05) Two-sided                          | 1.999254 |                          |  |

Table 9. T-test for 10nm Without SDCG tool Vs with SDGC tool

| t-Test (Assuming Equal Variances)<br>Unpaired Comparison for Means |          |          |  |
|--------------------------------------------------------------------|----------|----------|--|
|                                                                    | Group 1  | Group 2  |  |
| Mean                                                               | 2299.212 | 1345.988 |  |
| S.E.M.                                                             | 28.38631 | 20.07058 |  |
| S.D.                                                               | 170.3179 | 120.4235 |  |
| Variance                                                           | 29008.18 | 14501.81 |  |
| Sum                                                                | 82771.62 | 48455.58 |  |

#### Journal of Integrated Science and Technology

| Ν                                  | 36           | 36                           |
|------------------------------------|--------------|------------------------------|
| Sum(x^2)                           | 1.91E+08     | 65728209                     |
| Sum(x)^2/N                         | 1.90E+08     | 65220645                     |
| Correction Factor                  | 2.39E+08     |                              |
| Df                                 | 70           |                              |
| Expected Difference                | 0            |                              |
| Common Variance                    | 21755        |                              |
| t(cal)                             | 27.41899     | *** (P<=0.001) Two-<br>sided |
| P(t<=t(cal)) Two-sided             | 3.62E-39     |                              |
| t(0.05) Two-sided                  | 1.994437     |                              |
| Lower Conf. Limit of<br>Difference | 883.8866     |                              |
| Upper Conf. Limit of<br>Difference | 1022.56      |                              |
| F-Test for Equal Variances         |              |                              |
| F(cal)                             | 2.000314     | * (P<=0.05)                  |
| P(F<=F(cal))                       | 0.021873     |                              |
| F(0.15)                            | 1.424469     |                              |
| t-Test for Unequal Variance        | es (Aspin-We | lch)                         |
| Var1/N1+Var2/N2                    | 1208.611     |                              |
| С                                  | 0.666702     |                              |
| Df                                 | 62.99737     |                              |
| t(cal)                             | 27.41899     | *** (P<=0.001) Two-<br>sided |
| P(t<=t(cal)) Two-sided             | 9.99E-37     |                              |
| t(0.05) Two-sided                  | 1.998342     |                              |

Table 10: T-test for With SDCG tool - 14nm Vs 10nm

| t-Test (Assuming Equal Variances) |          |          |  |
|-----------------------------------|----------|----------|--|
| Unpaired Comparison for Means     |          |          |  |
|                                   | Group 1  | Group 2  |  |
| Mean                              | 1338.663 | 1345.988 |  |
| S.E.M.                            | 17.61184 | 20.07058 |  |
| S.D.                              | 105.671  | 120.4235 |  |
| Variance                          | 11166.36 | 14501.81 |  |
| Sum                               | 48191.88 | 48455.58 |  |
| Ν                                 | 36       | 36       |  |
| Sum(x^2)                          | 64903531 | 65728209 |  |
| Sum(x)^2/N                        | 64512708 | 65220645 |  |
| Correction Factor                 | 1.3E+08  |          |  |
| Df                                | 70       |          |  |

| Expected Difference                | 0        |                             |  |
|------------------------------------|----------|-----------------------------|--|
| Common Variance                    | 12834.09 |                             |  |
| t(cal)                             | -0 27432 | N.S. (P>0.05) Two-<br>sided |  |
| P(t<=t(cal)) Two-sided             | 0.784646 | Sided                       |  |
| t(0.05) Two-sided                  | 1.994437 |                             |  |
| Lower Conf. Limit of<br>Difference | -45.9308 |                             |  |
| Upper Conf. Limit of<br>Difference | 60.58071 |                             |  |
| F-Test for Equal Variances         |          |                             |  |
| F(cal)                             | 1.298705 | N.S. (P>0.05)               |  |
| P(F<=F(cal))                       | 0.221673 |                             |  |
| F(0.15)                            | 1.424469 |                             |  |

Table 11: T-test for Without SDCG tool - 14nm Vs 10nm

| t-Test (Assuming Equal Variances)  |          |                              |  |
|------------------------------------|----------|------------------------------|--|
| Unpaired Comparison for Means      |          |                              |  |
|                                    | Group 1  | Group 2                      |  |
| Mean                               | 1995.826 | 2299.211667                  |  |
| S.E.M.                             | 25.97721 | 28.38631219                  |  |
| S.D.                               | 155.8633 | 170.3178732                  |  |
| Variance                           | 24293.36 | 29008.17792                  |  |
| Sum                                | 71849.73 | 82771.62                     |  |
| Ν                                  | 36       | 36                           |  |
| Sum(x^2)                           | 1.44E+08 | 191324760.6                  |  |
| Sum(x)^2/N                         | 1.43E+08 | 190309474.4                  |  |
| Correction Factor                  | 3.32E+08 |                              |  |
| Df                                 | 70       |                              |  |
| Expected Difference                | 0        |                              |  |
| Common Variance                    | 26650.77 |                              |  |
| t(cal)                             | -7.88455 | *** (P<=0.001) Two-<br>sided |  |
| P(t<=t(cal)) Two-sided             | 2.98E-11 |                              |  |
| t(0.05) Two-sided                  | 1.994437 |                              |  |
| Lower Conf. Limit of<br>Difference | 226.6428 |                              |  |
| Upper Conf. Limit of<br>Difference | 380.1289 |                              |  |
| F-Test for Equal Variances         |          |                              |  |
| F(cal)                             | 1.194079 | N.S. (P>0.05)                |  |
| P(F<=F(cal))                       | 0.301319 |                              |  |
| F(0.15)                            | 1.424469 |                              |  |