

#### **Overview**

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains





# Static Timing Analysis

- Problem
  - Given a transistor level description of combinational circuit, find arrival times at all gate outputs (pattern independent)
- Solution
  - Clump transistors together into fundamental gates ("channel-connected components")
  - Propagate timing information (low **and** high polarities from primary inputs (PI's) to primary outputs (PO's) ("critical path method (CPM)")
  - Overcome miscellaneous hurdles along the way



### **Channel-Connected Components**

• A set of transistors interconnected by drain/source nodes



# The Critical Path Method (CPM)

Respond to minor changes in event-driven manner



Propagate changes by propagating events

*incremental\_analyze(g)* GATE g;

}

if (arr\_time@output(g) changes) {
 arr\_time = newvalue;
 for (*i* ∈ fanout(*g*))
 *incremental\_analyze(i)*;



- What simplifications are made here?
- What are the implications?



Classic example



# False Paths II

- No efficient algorithms exist to automatically flag false paths
- Knowledge like "this block will not run in two modes at the same time" is often crucial to determine false paths
- So: you need to specify them by hand...

### Gate Delay Models

- Build delay models for individual gates (current source/voltage source models)
  - In reality, Delay = f(widths, transition times, loads,...)
  - Similar idea used in standard cell characterization:

Delay = f (transition times, load)

- Table lookup models: storage/accuracy tradeoff (e.g. .lib format)
- Fast circuit simulation used in many delay calculators



- Example: 5% delay accuracy requires 4  $\pi$ -segments or 64 L-segments
- $\pi$ -model has fewer circuit nodes than T-model
- Other issues (crosstalk etc.) modeled using coupling caps

## Interconnect Modeling

Assume: Wire modeled by N equal-length segments

$$\tau_{DN} = \left(\frac{L}{N}\right)^2 (rc + 2rc + \dots + Nrc) = (rcL^2) \frac{N(N+1)}{2N^2} = RC \frac{N+1}{2N}$$

For large values of N:

$$\tau_{DN} = \frac{RC}{2} = \frac{rcL^2}{2}$$

- Quadratic function of length
- Distributed model <sup>1</sup>/<sub>2</sub> of the delay predicted by lumped RC model

#### Elmore Delay Computations

• For an RC tree:

$$t_d = \Sigma_{i \text{ on path}} R_i C_{downstream, i}$$



 $t_{d,4} = R_1 (C_1 + C_2 + C_3 + C_4 + C_5) + R_2 (C_2 + C_4 + C_5) + R_4 C_4$ 



- Elmore Delay inaccurate for the very resistive wires of modern technologies!
- Crosstalk has significant impact as well in modern technologies, as the wires are so close together.





Constant Field Scaling: S = U

| Parameter                    | Relation            | General Scaling  |  |
|------------------------------|---------------------|------------------|--|
| <i>W, L, t</i> <sub>ox</sub> |                     | 1/S              |  |
| $V_{DD}, V_{T}$              |                     | 1/U              |  |
| <b>C</b> <sub>gate</sub>     | C <sub>ox</sub> W L | 1/S              |  |
| I <sub>sat</sub>             | C <sub>ox</sub> W V | 1/U              |  |
| R <sub>on</sub>              | V/I <sub>sat</sub>  | 1                |  |
| Power / Device               | I <sub>sat</sub> V  | 1/U <sup>2</sup> |  |



# Idealized Wire Scaling Model

| Parameter                        | Relation | Local Wire        | Constant<br>Length    | Global Wire                                 |
|----------------------------------|----------|-------------------|-----------------------|---------------------------------------------|
| W, H, T                          |          | 1/S               | 1/S                   | 1/S                                         |
| L                                |          | 1/S               | 1                     | 1/S <sub>c</sub>                            |
| С                                | LW/T     | 1/S               | 1                     | 1/S <sub>c</sub>                            |
| R                                | L/WH     | S                 | S <sup>2</sup>        | S²/S <sub>C</sub>                           |
| <i>t<sub>p</sub></i> ~ <i>CR</i> | L²/HT    | 1                 | <u>S</u> <sup>2</sup> | S <sup>2</sup> /S <sub>C</sub> <sup>2</sup> |
| E                                | CV2      | 1/SU <sup>2</sup> | 1/U²                  | 1/(S <sub>c</sub> U <sup>2</sup> )          |



Interconnect Structure and RC model

# The Impact of Crosstalk on Delay









Input skew distribution

## Dealing with Capacitive Crosstalk

- Avoid floating nodes
- Protect sensitive nodes
- Differential signaling
- Do not run wires together for a long distance
- Use shielding wires
- Use shielding layers





#### Reducing Interconnect Power/Energy

- Same philosophy as with logic: reduce capacitance, voltage (or voltage swing) and/or activity
- A major difference: sending a bit(s) from one point to another is fundamentally a communications /networking problem, and it helps to consider it as such.
- Abstraction layers are different:
  - For computation: device, gate, logic, micro-architecture
  - For communication: wire, link, network, transport
- Helps to organize along abstraction layers, well understood in the networking world: the OSI (open system interconnection) protocol stack
- Some exciting possibilities for the future: 3D-integration, novel interconnect materials, optical or wireless I/O



- Point-to-point or time-multiplexed bus network
- Dedicated networks with reserved links preferable for high traffic channels – but: limited connectivity, area overhead
- Flexibility an increasing requirement in multi (many) core chip implementations

#### The Network Trade-off's

Trades-off flexibility, latency, energy and area-efficiency through:

- Locality eliminate global structures
- Hierarchy expose locality in communication requirements
- Concurrency/Multiplexing (function/area) optimal reuse of resources



#### Networking Topology

- Homogeneous
  - Bus, Star, Tree, Ring, Crossbar, Mesh (granularity), ...
- Heterogeneous
  - Hierarchy





Mesh (FPGA)









#### FPGAs

Programmable Switch Matrix



- The switches in the switch matrix are small (pass-) transistors.
- In FPGA's, the switch matrices in the connections add considerable resistance and hence delay!

#### (Pass-) Transistor Logic Delay



- Propagation delay is proportional to n<sup>2</sup>!
- Insert buffers

$$m_{opt} = 1.7 \sqrt{\frac{t_{pbuf}}{CR_{eq}}}$$

In current technologies, m<sub>opt</sub> is typically 3 or 4





Globally Asynchronous

self-timed handshaking protocol

Locally Synchronous

Allows individual modules to dynamically trade-off performance for energy-efficiency

- 1 cm chip: signal need 66 ps from one side to the other (transmission line)
- (*rc* effects dominate) 500 ps  $\rightarrow$  2 GHz clock
- Pipelining by inserting clocked buffer elements
  - Complicates timing, links timing of glob. interconnect and loc. computation
  - Hampers introduction of power reduction techniques





#### **Globally Asynchronous**



#### Locally synchronous

#### Future: Exploring the Unknown – Alternative Computational Models

#### Humans



Brain performs amazingly well

- Under very low SNR conditions
- Adapts effectively to failure and changing conditions

#### Concurrency

#### **Brain-inspired computing**



Instead of a system based on a small number of very reliable and complex components

A complex reliable system can emerge from the communication between huge numbers of simple nodes

#### **Future: Collaborative Networks**



Metcalfe's Law to the rescue of Moore's Law!



- Networks are intrinsically robust  $\rightarrow$  exploit it!
- Massive ensemble of cheap, unreliable components
- Network Properties:
  - Local information exchange  $\rightarrow$  global resiliency
  - Randomized topology & functionality → fits nano properties
  - Distributed nature → lacks an "Achilles heel"

**Bio-inspired** 

## What about Clock Distribution ?

- Clock easily the most energyconsuming signal of a chip
  - Largest length
  - Largest fan-out
  - Most activity ( $\alpha = 1$ )
- Skew control adding major overhead
  - Intermediate clock repeaters
  - De-skewing elements

## Opportunities

- Reduced swing
- Alternative clock distribution schemes
- Avoiding a global clock altogether



## Arguments for Sleep Mode Management

- Many computational applications operate in burst modes, interchanging active and non-active modes
  - General purposes computers, cell phones, interfaces, embedded processors, consumer applications, ...
- Prime concept: Power dissipation in standby should absolutely minimum, if not zero
- Sleep mode management has gained importance with increasing leakage

#### Standby Power - Was Not A Concern In Earlier Days



Floating Point Unit and Cache powered down when not in use

[Source: Intel]



- Turn off clocks to idle modules
  - Ensure that spurious activity is set to zero
- Must ensure that data inputs to module are in stable mode
- Can be done at different levels of system hierarchy



Turning off the clock for non-active components





Gated clock signal suffers from additional gate delay!

#### **Clock-gating Efficiently Reduces Power**



70% power reduction by clock-gating alone.

**Clock-gating reduced significance in the leakage dominant generation!** 

## **Clock Gating**

- Challenges to skew management and clock distribution (load on clock network varies dynamically)
- State-of-the-art design tools are starting to do a better job
  - For example, physically aware clock-gating inserts gaters in clock-tree based on timing constraints and physical layout



#### Trade-Off between Sleep-Modes and Sleep-Time

## Typical operation modes



Active mode normal processing



**Standby mode** fast resume high passive power



Sleep mode

slower resume low passive power

Resume time from clock gating determined by the time it takes to turn on the clock distribution network

#### **Standby Options:**

- Just gate the clock to the module in question
- Turn off phased-locked loop(s)
- Turn off clock completely

## The Standby Design Exploration Space



Trade-off between different operational modes Should blend smoothly with run-time optimizations

## **Overview**

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains



## **Static Timing Analysis**

Pro's of non-statistical STA

- Run-time linear in circuit size
- Conservative result
- Typically uses some fairly simple libraries (e.g. delay and slew)
- Easy to extend for use in optimization

#### Con's of non-statistical STA

- Cannot easily handle within-die correlation, especially if spatial correlation is included
- Needs many corners to handle all possible cases
- With significant random variations, to be conservative at all times, it is too pessimistic to result in competitive products
- Slower than linear time

## **Traditional Corner-Based Analysis**

- Given a set of parameters  $p_1, p_2, \dots, p_k$ 
  - Each parameter varies between  $[p_{i,min}, p_{i,max}]$
  - The variational region forms a multidimensional box



- Corner-based analysis performs simulations at each corner
- Typically parameters correspond to process parameters, temperature and voltage

## Corner Checks

|      |      | Corner |          |      | Parpose                                                                                    |  |  |
|------|------|--------|----------|------|--------------------------------------------------------------------------------------------|--|--|
| nMOS | pMOS | Wire   | $V_{DD}$ | Temp |                                                                                            |  |  |
| Т    | Т    | T –    | S        | S    | Timing specifications (binned parts)                                                       |  |  |
| S    | S    | S      | S        | S    | Timing specifications (conservative)                                                       |  |  |
| F    | F    | F      | F        | F    | Race conditions, hold time constraints, pulse collapse, noise                              |  |  |
| S    | S    | - 8    | F        | S    | Dynamic power                                                                              |  |  |
| F    | F    | F      | F        | S    | Subthreshold leakage noise and power, overall noise analysis                               |  |  |
| S    | S    | F      | S        | S    | Races of gates against wires                                                               |  |  |
| F    | F    | S      | F        | F    | Races of wires against gates                                                               |  |  |
| S    | F    | Т      | F        | F    | Pseudo-nMOS and ratioed circuits noise margins, memory read/w<br>race of pMOS against nMOS |  |  |
| F    | S    | 1      | E.       | F    | Ratioed circuits, memory read/write, race of nMOS against pMOS                             |  |  |

## **Corner-Based Analysis: Problems**

- The number of corners we need to examine grows exponentially with the number of parameters
- It is only conservative if there is a monotone relationship between the parameter and the delay; otherwise the worst behavior may be somewhere in between!
- For very advanced technologies, monotonicity is no longer true for all parameters!



## Statistical Static Timing Analysis

- Path-based analysis: find variability along a single path (sums gate and wire delays on specific paths)
  - Path selection important!



- Block-based analysis: calculate arrival times for each node (forward and backward from the clocked elements):
  - Statistical max (or min) operation that also considers correlation!

## **Difficulties in Statistical Timing Analysis**

Path correlation due to reconvergent fan-outs



Spatial correlations between nearby gates

## **Modeling Spatial Correlations**

- Chip area divided into rectangles
  - Nearby squares are correlated
- Alternative hierarchical model



(0,1)



## **Modeling Spatial Correlations**

Table 4.1 MOST key parameters in 0.18 CMOS technology at  $V_{BS}=0V(a)$   $I_{DS,Ian}$  at  $V_{GS}=1.8V$  and  $V_{DS}=0.1V$  c.  $I_{DS,Ian}$  at  $V_{GS}=-1.8V$  and  $V_{DS}=-0.1V(b)$   $I_{DS,au}$  at  $V_{GS}=1.8V$  and  $V_{DS}=1.8V$  d.  $I_{DS,au}$  at  $V_{GS}=-1.8V$  and  $V_{DS}=-1.8V$ 

|           | W/L = 10/1000 | 0.18  | р           | W/L = 10/0.18 |        |                   |
|-----------|---------------|-------|-------------|---------------|--------|-------------------|
| p         | μ             | σ     |             | μ             | σ      | Unit              |
| VmN       | 516.92        | 10.44 | VTOP        | 481,148       | 10.103 | mV                |
| KoN       | 422.53        | 10.34 | Kor         | 518.538       | 13.109 | mV <sup>1/2</sup> |
| KN        | 446.967       | 8,461 | KP          | 451.971       | 17.434 | mV1/2             |
| βN        | 26.334        | 1.290 | βP          | 6.775         | 0.261  | mA/V <sup>2</sup> |
| Wegn      | 10.034        | 0.010 | WeffP       | 10.034        | 0.010  | μm                |
| LeffN     | 0.108         | 0.005 | $L_{eff,P}$ | 0.143         | 0.005  | μm                |
| IDS_Im"   | 1.354         | 0.018 | IDS. BA     | 0.402         | 0.018  | mA                |
| IDS . sut | 6.035         | 0.226 | IDS. sul    | 2.914         | 0.226  | mA                |





[Ref: A. Zjajo, TVLSI'09]

## **Problem Statement**

• Find the PDF/CDF for circuit delay distribution:

$$D_{max} = \max(D_1, D_2, \dots, D_{npaths})$$

where  $D_i$ : delay distribution of *i*<sup>th</sup> path in the circuit

- Assume normal distributions on process parameter values
  - Why?
  - Is this reasonable? If not, what is?
- Parameter correlations
  - L<sub>eff</sub> shows high spatial correlations
  - T<sub>ox</sub>, N<sub>d</sub> are largely uncorrelated

## Some Basics of Probability

Mean, variance

$$\mu = E[X] = \int_{-\infty}^{\infty} x f(x) dx,$$
  
$$\sigma^2 = E[(X - \mu)^2] = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) dx$$

Covariance

$$\operatorname{cov}(X, Y) = \mathbf{E}\left[(X - \mathbf{E}[X])(Y - \mathbf{E}[Y])\right],$$

- Mean expected value of variable X.
- Variance expected value of squared deviation from the mean of X.
- Covariance a measure of how much two random variable change together.
- Correlation coefficient

 $\rho_{X,Y} = \operatorname{corr}(X,Y) = \frac{\operatorname{cor}(X,Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}, \quad \text{Number that quantifies correlation.}$ 

Independence

 $f(x,y) = f(x)f(y) \Rightarrow E(XY) = E(X)E(Y)$ 

 Independent if the realization of one does not affect the probability distribution of the other.

## **Commonly-Encountered Distributions**

• Gaussian or normal distribution  $N(\mu, \sigma^2)$ 



Figure 1.1: Gaussian or Normal pdf, N(2, 1.5<sup>2</sup>) Gaussian pdf with different variances ( $\sigma_1^2 = 3^2, \sigma_2^2 = 2^2, \sigma_3^2 = 1$ )

$$\mathsf{CDF} \quad Pr\{X \le x_a\} = \begin{cases} 0.5 - erf(\frac{\mu - x_a}{\sigma}) & for \quad x_a \le \mu \\ 0.5 + erf(\frac{x_a - \mu}{\sigma}) & for \quad x_a \ge \mu \end{cases} \qquad erf(x) = \frac{1}{\sqrt{2\pi}} \int_0^x \exp^{-y^2/2} dy$$

CDF of X evaluated at x - probability that X will take a value  $\leq x$ .

For a Gaussian, independence is identical to uncorrelatedness

 $f(xy) = f(x)f(y) \iff E(XY) = E(X)E(Y) \Leftrightarrow \rho = 0.$ 

[http://users.isr.ist.utl.pt/~mir/pub/probability.pdf]

#### Commonly-Encountered Distributions (contd.)

• Multivariate Gaussian  $N(\mu, \Sigma)$ 

• PDF 
$$f_X(x) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp\left\{-\frac{1}{2}(x-m_X)^T \Sigma^{-1}(x-m_X)\right\}$$

$$m_X = E \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{bmatrix} = \begin{bmatrix} m_{X_1} \\ m_{X_2} \\ \vdots \\ m_{X_n} \end{bmatrix}.$$

 A random vector is said to be *k*-variate normally distributed if every linear combination of its *k* components has a univariate normal distribution.

$$\begin{split} \Sigma_X &= \Sigma_X^T = \\ &= \begin{bmatrix} E(X_1 - m_{X_1})^2 & E(X_1 - m_{X_1})(X_2 - m_{X_2}) & \dots & E(X_1 - m_{X_1})(X_n - m_{X_n}) \\ E(X_2 - m_{X_2})(X_1 - m_{X_1}) & E(X_2 - m_{X_2})^2 & \dots & E(X_2 - m_{X_2})(X_n - m_{X_n}) \\ &\vdots & & \vdots \\ E(X_n - m_{X_n})(X_1 - m_{X_1}) & \dots & \dots & E(X_n - m_{X_n})^2 \end{bmatrix} \end{split}$$

#### Commonly-Encountered Distributions (contd.)

- Bivariate Gaussian
  - PDF

$$\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-m_X)^2}{\sigma_X^2}-\frac{2\rho(x-m_X)(y-m_Y)}{\sigma_X\sigma_Y}+\frac{(y-m_Y)^2}{\sigma_Y^2}\right)\right]$$

- The sum of two Gaussians is a Gaussian
- Z= X+Y, where X, Y are uncorrelated Gaussians



$$\sigma_{X+Y} = \sqrt{\sigma_X^2 + \sigma_Y^2 + 2\rho\sigma_X\sigma_Y},$$



## The Ellipsoid

- Locus of equiprobable points for most distributions is not a box
  - Gaussian: locus of equiprobable points forms an ellipsoid



- Ellipse centered at  $(m_{\chi}, m_{\gamma})$ , axes along the eigenvectors of the covariance matrix
- Uncorrelated case

Correlated case





## Idea of Orthogonal Transformations

Orthogonal transformation (OT) - a linear transformation on an inner product space (preserve lengths of vectors and angles between them)

**PCA** – converting correlated into linearly uncorrelated variables using OT

-  $1^{st}$  principal component (PC) has the largest  $\sigma$ ; each succeeding PC has the highest  $\sigma$  possible under the constraint that it is orthogonal to the preceding components.

- the resulting vectors are an uncorrelated orthogonal set.



## Berkelaar's Method

[Ber97a,Ber97b,JB00]

- Types of operations in STA
  - SUM:  $T_{a \rightarrow out} = T_a + d_{a \rightarrow out}$ ;  $T_{b \rightarrow out} = T_b + d_{b \rightarrow out}$
  - MAX:  $T_{out} = max(T_{a \rightarrow out}, T_{b \rightarrow out})$



- Gate delay modeled as a Gaussian
  - SUM is easy: sum of Gaussians = Gaussian

• S = A+B:  $\mu_{S} = \mu_{A} + \mu_{B}$ ,  $\sigma_{s}^{2} = \sigma_{A}^{2} + \sigma_{B}^{2}$ 

- MAX of Gaussians is not a Gaussian
- Approach: approximate max by a Gaussian
  - Analytic expressions for mean, variance in [JB00]

Note: Calculating "MIN" for early mode analysis is analogous to calculating "MAX" since MIN(f) = MAX(-f)

## Approach: Approximate MAX by a Gaussian

$$\begin{split} \mu_{\rm C} &= {\rm Ex}_{\rm C} = \frac{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}}{\sqrt{2\pi}} e^{-\frac{1}{2} \left( \frac{\mu_{\rm A} - \mu_{\rm B}}{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}} \right)^2} + \mu_{\rm A} \varphi \left( \frac{\mu_{\rm A} - \mu_{\rm B}}{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}} \right) + \mu_{\rm B} \varphi \left( \frac{\mu_{\rm B} - \mu_{\rm A}}{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}} \right) \\ & \varphi(x) = \int_{-\infty}^{x} e^{-\frac{1}{2}u^2} du \\ {\rm Ex}_{\rm C}^2 &= (\mu_{\rm A} + \mu_{\rm B}) \frac{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}}{\sqrt{2\pi}} e^{-\frac{1}{2} \left( \frac{\mu_{\rm A} - \mu_{\rm B}}{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}} \right)^2} + \\ & (\sigma_{\rm A}^2 + \mu_{\rm A}^2) \varphi \left( \frac{\mu_{\rm A} - \mu_{\rm B}}{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}} \right) + \\ & (\sigma_{\rm B}^2 + \mu_{\rm B}^2) \varphi \left( \frac{\mu_{\rm B} - \mu_{\rm A}}{\sqrt{\sigma_{\rm A}^2 + \sigma_{\rm B}^2}} \right) \end{split}$$

$$\sigma_{\rm C}^2 = E x_{\rm C}^2 - \mu_{\rm C}^2$$

# Why I didn't write the precise expression in the last slide... (proof)

#### Appendix A

We will now derive the mean and standard deviation of a stochastic variable C which is the maximum of two normal distributed statistically independent stochastic variables A and B. In order to derive this mean and standard deviation we will change the bases of the double integration:

(19)

(20)

(21)

(22)

(23)

(24)

$$\begin{split} \int_{-\infty}^{\infty} x f_{A}(x) & \int_{-\infty}^{\infty} f_{B}(y) dy dx & (4) \\ \text{which part of the calculation of } \mu_{C} &= Ex_{C} \text{ as follows:} \\ \frac{x - \mu_{A}}{\sigma_{A}} &= \frac{u \sigma_{B}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} - \frac{v \sigma_{A}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} & (4) \\ \text{which gives:} \\ x &= \frac{u \sigma_{A} \sigma_{B}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} - \frac{v \sigma_{A}^{2}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} + \mu_{A} & (4) \\ \text{and:} \\ \frac{y - \mu_{B}}{\sigma_{B}} &= \frac{u \sigma_{A}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} + \frac{v \sigma_{B}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} & (4) \\ \text{which gives:} \\ y &= \frac{u \sigma_{A} \sigma_{B}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} + \frac{v \sigma_{B}^{2}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} + \mu_{B} & (4) \\ \text{For this change of base we calculate:} \\ \\ \left| \frac{\delta(x, y)}{\delta(v, u)} \right| &= \left| \frac{\sigma_{A} \sigma_{B}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} - \frac{\sigma_{A}^{2}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} + \frac{\sigma_{B}^{2}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} \right| & (4) \\ \end{array}$$

$$= \frac{\sigma_A \sigma_B^3 + \sigma_A^3 \sigma_B}{\sigma_A^2 + \sigma_B^2} = \sigma_A \sigma_B$$

The mean of the stochastic variable C then becomes:

$$\begin{split} \mu_{C} &= \int_{-\pi}^{\pi} x f_{C}(x) dx \end{split} \tag{25} \\ &= \int_{-\pi}^{\pi} x f_{A}(x) F_{B}(x) dx + \int_{-\pi}^{\pi} x F_{A}(x) f_{B}(x) dx \\ &= \frac{1}{\sigma_{A} \sigma_{B} \sqrt{2\pi} \sqrt{2\pi}} \int_{-\pi}^{\pi} \int_{-\pi}^{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} \left( \frac{w \sigma_{A} \sigma_{B}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} - \frac{v \sigma_{A}^{2}}{\sqrt{\sigma_{A}^{2} + \sigma_{B}^{2}}} + \mu_{A} \right) e^{-\frac{1}{2} u^{2} + v^{2}} \sigma_{A} \sigma_{B} dv du + \dots \end{split}$$



Note that in some lines of equation 25 we have only given one half of the equation explicitly. The other half is depicted by triple dots, and is similar to the first half of the equation. We will now calculate the standard deviation of stochastic variable C in two steps. The first step is the calculation of  $Ex_{2}^{2}$ :

(26)

(27)

$$\begin{split} & Ex_{C}^{2} = \int\limits_{-\infty}^{\pi} x^{2} f_{C}(x) dx \\ & = \int\limits_{-\infty}^{\pi} x^{2} f_{A}(x) F_{B}(x) dx + \int\limits_{-\infty}^{\pi} x^{2} F_{A}(x) f_{B}(x) dx \end{split}$$



$$\begin{split} & \left[\frac{2\mu_{A}\sigma_{A}^{2}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}e^{-\frac{1}{2}r^{2}}\right]_{-r}^{\psi_{A}^{*}\sigma_{B}^{*}} + \\ & \frac{\mu_{A}\sigma_{B}^{*}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}} \\ & -\frac{1}{\sqrt{2\pi}}\int_{-r}^{r} \left(\frac{\sigma_{A}^{4}+\sigma_{A}^{2}\sigma_{B}^{2}}{\sigma_{A}^{2}+\sigma_{B}^{2}}+\mu_{A}^{2}\right)e^{-\frac{1}{2}r^{2}}dv + \dots \\ & (\sigma_{A}^{2}+\mu_{A}^{2})\varphi\left(\frac{\mu_{A}-\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2} \\ & +\frac{e^{-\frac{1}{2}\left(\frac{\mu_{A}+\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2}}{\sqrt{2\pi}\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\left(2\mu_{A}\sigma_{A}^{2}-\frac{\sigma_{A}^{4}(\mu_{A}-\mu_{B})}{\sigma_{A}^{2}+\sigma_{B}^{2}}\right) + \dots \\ & \frac{e^{-\frac{1}{2}\left(\frac{\mu_{A}+\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2}}}{\sqrt{2\pi}\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\left(\frac{\mu_{A}\sigma_{A}^{4}+2\mu_{A}\sigma_{A}^{2}\sigma_{B}^{2}+\mu_{B}\sigma_{A}^{2}}{\sigma_{A}^{2}+\sigma_{B}^{2}}\right) + \dots \\ & (\sigma_{A}^{2}+\mu_{A}^{2})\varphi\left(\frac{\mu_{A}-\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right) + \dots \\ & (\sigma_{A}^{2}+\mu_{A}^{2})\varphi\left(\frac{\mu_{A}-\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right) + \dots \\ & (\sigma_{B}^{2}+\mu_{B}^{2})\varphi\left(\frac{\mu_{A}-\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right) + \dots \\ & (\sigma_{B}^{2}+\mu_{B}^{2})\varphi\left(\frac{\mu_{A}-\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right) + \dots \\ & (\mu_{A}+\mu_{B})\frac{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}{\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{\mu_{A}+\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2}} \\ & (\mu_{A}+\mu_{B})\frac{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}{\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{\mu_{A}+\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2}} \\ & (\mu_{A}+\mu_{B})\frac{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}{\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{\mu_{A}+\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2}} \\ & (\mu_{A}+\mu_{B})\frac{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}{\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{\mu_{A}+\mu_{B}}{\sqrt{\sigma_{A}^{2}+\sigma_{B}^{2}}}\right)^{2}} \\ \end{pmatrix} \end{array}$$

7 /A-PB

Note that in equation 27 we also have given only one half of the equation explicitly, with the other half which is similar to the first, depicted by triple dots. We can now calculate the standard deviation of stochastic variable C with the following equation:

$$\sigma_{c}^{2} = Ex_{c}^{2} - \mu_{c}^{2}$$
(28)

We have now expressed  $\mu_c$  and  $\sigma_c$  as functions of just  $\mu_a$ ,  $\mu_B$ ,  $\sigma_A$  and  $\sigma_B$ .

## Statistical Static Timing Analysis

Not considering statistics can result in > 30% delay errors

SSTA Con's

- Complex, especially with realistic (non-Gaussian) distributions
- Difficult to extend to an optimization flow
- Required data likely to be time-varying and hence unreliable
- If the fab change statistical properties of the process, design have to be re-evaluated

#### Active research field

- Efficient SSTA solvers (e.g., PWL-RDE solver, see Q.Tang (TCAD,'14))
- An enhanced deterministic STA that also takes into account sensitivities and correlation

## **Overview**

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains



## Design Flow



Tools

- Synthesis: Synopsys Design Compiler
- Place & Route: Cadence SOC Encounter 8.1
- Process UMC L90 SP
- Standard cell library
  - Faraday:
    - fsd0a\_a\_generic\_core
    - fod0a\_b25\_t25\_generic\_io





**T**UDelft

70 | 81

#### Synthesis Transformations





#### Default Clock Behavior

- Defining the clock in a single-clock design constrains all timing paths between registers for single-cycle, setup time
- By default the clock rises at Ons and has a 50% duty cycle
- By default DC will not "buffer up" the clock network, even when connected to many clock/enable pins of flip-flops/latches
  - The clock network is treated as "ideal" infinite drive capability
    - Zero rise/fall transition times
    - Zero skew
    - Zero insertion delay or latency
  - Estimated skew, latency and transition times can, and should be modeled for a more accurate representation of clock behavior



72 | 81

### Defining a Clock





74 | 81

## Specifying Setup-Timing Constraints



- Objective: Define setup timing constraints for all paths within a sequential design
  - All input logic paths (starting at input ports)
  - The internal (register to register) paths
  - All output paths (ending at output ports)



75 | <mark>8</mark>1

## **Constraining Input Paths**

The user must specify the latest arrival time of the data at input A

What is  $T_{max}$  for N ?



create\_clock -period 2 [get\_ports Clk]

set\_input\_delay -max 0.6 -clock Clk [get\_ports A]



### **Constraining Output Paths**

The user must specify the latest arrival time of the data at output B

What is  $T_{max}$  through S?



create\_clock -period 2 [get\_ports Clk] set\_output\_delay -max 0.8 -clock Clk [get\_ports B]



The maximum delay to port B = 0.7 ns

create\_clock -period 2 [get\_ports Clk]

set\_output\_delay -max 1.3 -clock Clk [get\_ports B]

**T**UDelft

77 | 81

## Environmental attributes



- Input drivers and transition times
- Capacitive output loads
- Process/Voltage/Temperature (PVT) operating conditions
- Interconnect parasitic RCs



## Input drivers and transition times

 Rise and fall transition times on an input port affect the cell delay of the input gate



set\_input\_transition 0.6 [get\_ports A]
set\_driving\_cell -lib\_cell OR3B [get\_ports A]
set\_driving\_cell -lib\_cell FD1 -pin Qn [get\_ports A]

**″ T**∪Delft

## Capacitive output loads

Capacitive loading on an output port affects the transition time and thereby the cell delay of the output driver



set\_load [expr 30.0/1000] [get\_ports B]



## Wrap-Up

- Design Constraints
  - Power, Area, Frequency, CMOS Scaling
- Timing
  - Timing Metrics, Paths, Variability and Delay
- Deterministic Timing Analysis (Static Timing Analysis)
  - Models, Interconnect, Networks, Clock Distribution
- Statistical Timing Analysis
  - Probability, Spatial Correlations, MAX function
- Design Flow
  - Synthesis, Transformation, Definitions, Constrains

