



# Amir Zjajo, PhD

Research focus:

- Biomedical Interfaces/SoC
  - From sensors to sense-making: Signal acquisition, conditioning, quantization, detection, classification
- Neuromorphic Cognitive Systems (Hardware)
  - Brain-like systems with adaptation, self-organization, and learning
- Currently (co-)supervising 12 MSc students



# Amir Zjajo, PhD

Contact:

- http://cas.et.tudelft.nl/~zjajo
- Office: Circuits and Systems Group EWI 17.270
- E-mail: a.zjajo@tudelft.nl



### Acknowledgement

• Michel Berkelaar



# **Overview**

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks-on-Chip, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains



#### **Design Constraints**



# After all of the high level synthesis, what do we have?

- a set of operation units to implement (\*, +, <, etc)</li>
- some memory to implement (registers, ...)
- a controller to implement
- connections between all of these, with multiplexers, branch logic, etc.
- a vague idea of time (cycles)



# Questions

- What objectives does a designer have?
- What causes delay on an IC?



# Where do we need to go?

- Design of a real chip / block of a chip, ready for fabrication
- ... optimized as best as you can
  - Power (= battery life and heat)
  - Area (= cost and yield)
  - Clock frequency (constraint? higher is better?)
- ... functioning with the rest of the IC / PCB
- ... with all the nasty details sorted out (reset, test, power distribution, clock distribution, EMC, IO standards, etc)



# How do we get there?

- There is good software to help us
- But we always need to help it by specifying what we want, especially by providing timing constraints!
- Unless we know what we are doing, the design
  - may (will) not work
  - may (will) use more power / area than necessary





• Probably your most important design parameter!



- CMOS has relatively low static dissipation
- Power dissipation was the reason that CMOS technology won over bipolar and NMOS technology for digital IC's
- (Extremely) high clock frequencies increase dynamic dissipation
- Low  $V_T$  increases leakage
- Advanced IC design is a continuous struggle to contain the power requirements!





#### **Estimate**

■ Furnace: ■ Processor chip: Human brain:

2000 W, r=10 cm  $\rightarrow$  P  $\approx$  6 W/cm<sup>2</sup>

100 W, 3 cm<sup>2</sup>  $\rightarrow$  P  $\approx$  33 W/cm<sup>2</sup> 20 W, ~1.3 dm<sup>3</sup>  $\rightarrow$  P ~ 0.015 W/cm<sup>3</sup>



- Dynamic Power Consumption Charging and discharging capacitors
- Short Circuit Currents Short circuit path between supply rails during switching (NMOS and PMOS on together)
- Leakage
  - Leaking diodes and transistors
  - Important for battery-operated equipment



$$P \sim \alpha \cdot (C_L + C_{CS}) \cdot V_{swing} \cdot V_{DD} \cdot f + (I_{DC} + I_{Leak}) \cdot V_{DD}$$

 $\alpha$  – switching activity

- C<sub>L</sub> load capacitance
- *C<sub>CS</sub>* short-circuit capacitance

• 
$$V_{swing}$$
 – voltage swing

• *f* – frequency

•  $I_{DC}$  – static current

• 
$$I_{leak}$$
 – leakage current

$$P = \frac{energy}{operation} \times rate + static \ power$$



# **CMOS Leakage Power**

Sub-threshold current of MOS devices

 $I_{D(sub-threshold)} \approx K_1 W e^{-V_T / nV_{thermal}} \left(1 - e^{-V_{DD} / V_{thermal}}\right)$ 

No channel → parasitic bipolar device: n+ (source) - p (bulk) - n+ (drain)





# **CMOS Short-Circuit Power**



Best to maintain approximately equal input/output slopes

**ŤU**Delft

#### Area

- Bigger means more expensive, more chance of defects during production
- \$\$\$



# **Clock frequency**

- Delay in a switch/wire dictated by physics on the chip
- Big transistors: small R, high power
- Big transistors: large C
- More modern technology -> better transistors, more resistive wires (but shorter distances)!





#### Moore's Law

The number of transistors that can be integrated on a single chip will double every 12 months (later adjusted to 18 months)

Gordon Moore, co-founder of Intel [Electronics, vol. 38, no. 8, 1965]





- Reduce price per function:
- Want to sell more functions (transistors) per chip for the same money → better products
- Build same products cheaper, sell the same part for less money → larger market
- Price of a transistor has to be reduced
- But also want to be faster, smaller, lower power



- Fixed Voltage Scaling
- most common model until 1990's
- only dimensions scale, voltages remain constant
- Full Scaling (Constant Electrical Field)
- ideal model dimensions and voltage scale together by the same factor S
- General Scaling
- most realistic for today's situation voltages and dimensions scale with different factors



Constant Field Scaling: S = U

| Parameter                        | Relation                | General Scaling                |
|----------------------------------|-------------------------|--------------------------------|
| <i>W, L, t</i> <sub>ox</sub>     |                         | 1/S                            |
| V <sub>DD</sub> , V <sub>T</sub> |                         | 1/U                            |
| Area / Device                    | WL                      | 1/S <sup>2</sup>               |
| C <sub>ox</sub>                  | 1/t <sub>ox</sub>       | S                              |
| C <sub>gate</sub>                | C <sub>ox</sub> W L     | 1/S                            |
| I <sub>sat</sub>                 | C <sub>ox</sub> W V     | 1/U                            |
| Current Density                  | I <sub>sat</sub> / Area | S²/U                           |
| R <sub>on</sub>                  | V/I <sub>sat</sub>      | 1                              |
| Intrinsic Delay                  | Ron Cgate               | 1/S                            |
| Power / Device                   | I <sub>sat</sub> V      | 1/U²                           |
| Power Density                    | P/Area                  | S <sup>2</sup> /U <sup>2</sup> |





# **ASIC Design Advantages**

- Cost: lower unit costs
- Speed: ASICs are faster than FPGAs
- Power: ASICs consume less power
- Complexity: bigger designs can fit
- Can add analog / mixed circuit blocks



# **ASIC Disadvantages**

- Time-to-market: some large ASICs can take a year or more to design
- Design Issues: all the dirty details (Floorplan, Signal Integrity, power/clock distribution, EMC, DFT, etc)
- Expensive Tools: ASIC design tools are very expensive
- A design bug means re-fabrication...



# **FPGA Design Advantages**

- Faster time-to-market: No layout, masks or other manufacturing steps are needed for FPGA design.
- No NRE (Non Recurring Expenses)
- Simpler design cycle
- Field Reprogrammable (bug fixes...)
- Reusable for other design
- FPGAs are good for prototyping and limited production
- Generally FPGAs are used for lower speed, lower complexity and lower volume designs



### **FPGA Disadvantages**

- Lower performance (10x)
- Higher power
- High cost / unit



# **Xilinx FPGA Tool Flow**

**T**UDelft



X11040

# **Tool flow details**

- Much of the tool flow is automatic
- But: timing constraints need to be specified!
- To be able to do this, you need to understand timing!
  - sources of delay
  - how it is measured / estimated
  - how your constraints impact the tool outcome





• How do you know your RTL design is correct?



# Questions

- RTL design is used in the **logic** design phase
- RTL description (usually) converted to a gate-level description by a logic synthesis tool
- The synthesis results are then used by placement and routing tools to create a physical layout
- Logic simulation tools may use a design's RTL description to verify its correctness



# **Estimating timing: simulation**

- Before synthesis, VHDL simulation is free from timing issues
- After synthesis, delay can be added to the simulation
- Simulation is never a proof: it only shows what happens for the few vectors you are simulating
- Even if simulation shows no problems, the chip may not work for other inputs!
- Simulating all transitions with N input bits requires 2 vectors!
- Internal memory (state) makes this (much) worse.



# **Static Timing Analysis**

- We need a method that guarantees that the chip will always work.
- We may need to allow some level of inaccuracy (pessimistic!) to make it computationally efficient.



# **Overview**

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains







[Taskin, Kourtev & Friedman, The VLSI Handbook]



#### D-Register (more commonly known as D-Flip-Flop)



[Taskin, Kourtev & Friedman, The VLSI Handbook]





**T**UDelft

#### Example: How to reduce clock period





#### Example: How to reduce clock period





| <b>Clock Period</b> | Adder       | Absolute Value | Logarithm           |
|---------------------|-------------|----------------|---------------------|
| 1                   | $a_1 + b_1$ | 2              |                     |
| 2                   | $a_2 + b_2$ | $ a_1 + b_1 $  |                     |
| 3                   | $a_3 + b_3$ | $ a_2 + b_2 $  | $\log( a_1 + b_1 )$ |
| 4                   | $a_4 + b_4$ | $ a_3 + b_3 $  | $\log( a_2 + b_2 )$ |
| 5                   | $a_5 + b_5$ | $ a_4 + b_4 $  | $\log( a_3+b_3 )$   |

 $T_{clk} > t_{c-q} + max(t_{p,add}, t_{p,abs}, t_{p,log}) + t_{su}$ 

Increase functional throughput

#### Example: How to reduce clock period

- Pipelining: very popular/effective measure to increase functional throughput and resource utilization
- At the cost of increased *latency*
- All high performance microprocessors excessively use pipelining in instruction fetch-decode-execute sequence

Bottom line: more flip-flops, greater timing design problems



#### Slow Path Skew Constraint



#### Fast Path Skew Constraint



#### Setup and Hold

- If your design does not meet the setup timing constraints, it will work at a lower clock frequency
- If your design does not meet hold timing constraints, it will not work at any clock frequency!



### **Timing Paths**

• Four types of paths:





#### Digital Circuit – Sequential Path



#### Static Timing Analysis (STA)



Limited signal information is stored after each stage



#### From Signal to



Limited signal information is stored after each stage





**T**UDelft

#### Digital Circuit – Delay Variation



**T**UDelft



**T**UDelft

#### Digital Circuit – Delay PDF



**T**UDelft

#### Statistical Moments

• Measure the appearance of a distribution



- 2. Variance  $(\sigma^2)$  Spread
- 3. Skewness ( $\gamma$ ) Symmetry
- 4. Kurtosis (κ) Flatness





#### Statistical Delay





#### Statistical Slew



![](_page_54_Picture_2.jpeg)

#### Variability Sources and their Time Scales Supply/Package Manufacturing Temperature – Signal Coupling Noise **Modal Operation** Wear-out 10-10-10-8 10-4-10-2 10-7-10-5 105-107 Vafer Y [mm] 0-50 600 800 Time (ps) 1000 1200 -20 0 20 40 60 80 Wafer X [mm] -80 -60 -40

![](_page_55_Picture_1.jpeg)

56 | 65

# Temperature Sensitivity

lon/loff

- Increasing temperature
  - Reduces mobility
  - Reduces  $V_{TH}$
- *I<sub>ON</sub>* decreases with temperature
- *I<sub>OFF</sub>* increases with temperature

![](_page_56_Figure_6.jpeg)

![](_page_56_Picture_7.jpeg)

#### Thomas J. Watson Research Center

![](_page_57_Picture_1.jpeg)

#### Increasing and inevitable parametric variability

![](_page_57_Figure_3.jpeg)

![](_page_57_Picture_4.jpeg)

#### **Process Variations**

![](_page_58_Figure_1.jpeg)

**T**UDelft

#### Threshold Variations Most Important for Power

![](_page_59_Figure_1.jpeg)

Decrease of random dopants in channel increases impact of variation on threshold voltage

**T**UDelft

### Device and Technology Innovations

- Power challenges introduced by nanometer MOS transistors can be partially addressed by new device structures and better materials
  - Higher mobility
  - Reduced leakage
  - Better control
- However ...
  - Most of these techniques provide only a one (two) technology generation boost
  - Need to accompanied by circuit and system level methodologies

![](_page_60_Picture_8.jpeg)

#### Stochastic Process Variation in Deep-Submicron CMOS: Circuits and Algorithms

![](_page_61_Picture_1.jpeg)

![](_page_61_Picture_2.jpeg)

Recommended reading (free download via SpringerLink book <u>site</u> – through University subscription)

![](_page_61_Picture_4.jpeg)

# Delay impact of variations

| Parameter                                                        | <b>Delay Impact</b>       |
|------------------------------------------------------------------|---------------------------|
| BEOL metal                                                       | $-10\% \rightarrow +25\%$ |
| (Metal mistrack, thin/thick wires)                               |                           |
| Environmental                                                    | ±15 %                     |
| (Voltage islands, IR drop, temperature)                          |                           |
| Device fatigue (NBTI, hot electron effects)                      | ±10%                      |
| $V_{\rm t}$ and $T_{\rm ox}$ device family tracking              | ±5%                       |
| (Can have multiple $V_{\rm t}$ and $T_{\rm ox}$ device families) |                           |
| Model/hardware uncertainty                                       | ±5%                       |
| (Per cell type)                                                  |                           |
| N/P mistrack                                                     | ±10%                      |
| (Fast rise/slow fall, fast fall/slow rise)                       |                           |
| PLL                                                              | ±10%                      |
| (Jitter, duty cycle, phase error)                                | [Courtesy Kerim Kalafala] |

• Requires 2<sup>20</sup> timing runs or [-65%,+80%] guard band!

© Chandu Visweswariah, 2004

Statistical Timing of Digital Integrated Circuits

5

![](_page_62_Picture_6.jpeg)

#### Handling Variations

- Variability is huge! [-65%, +80%] guard band.
- Corners: provide best ( $\mu$ -3 $\sigma$ ), typical, worst ( $\mu$ +3 $\sigma$ ) case values
  - With *n* varying parameters 3<sup>*n*</sup> corners (!)
  - Simple calculations
  - Pessimistic
- Statistical Analysis:
  - Complex calculations (correlations!)
  - Result hard to interpret

![](_page_63_Picture_9.jpeg)

# **Overview**

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains

![](_page_64_Picture_11.jpeg)