

## **Timing Analysis**

## **Prof. Dejan Marković** ee216a@gmail.com

#### **Two Types of Machines with State**

#### And two quite different abstract models:

- Data storage used for computation (Data Flows)
- States for sequencing information (Finite State Machines)

#### **Data Flows (Storage for Computation)**

- The storage holds data that is being manipulated.
  The (enormous) number of bits does not matter.
  It is simply the data-set that is being manipulated.
- State is not that important, it is the flow of data that is critical

#### FSMs (States for Sequencing of Information)

- The storage is used to hold your place in some decisionmaking process. It indicates where you are, and using this information you decide what to do next.
- The amount of state (number of unique decision points) is finite, and usually limited
  - One could draw out the "decision graph" showing the possible transitions between states

## **Timing Analysis**

#### **Timing constraints:**

- Long path: Setup time
  - Leads to cycle time violation
  - Fix: increase cycle time (can be done during chip operation)
- Short path: Hold time
  - Leads to functional violation
  - Fix: insert buffers (can only be done at design time)
- Clock nonidealities (skew and jitter) directly impact timing constraints

#### Timing: Cycle Time & Race Margin



#### The Setup / Cycle Time Constraint



#### The Hold / Race Margin Constraint



### **Clock Nonidealities**

 Skew: spatial variation in temporally equivalent clock edges; deterministic + random, t<sub>Skew</sub>



- Clock jitter: temporal variations in consecutive edges of the clock signal; modulation + random noise, t<sub>jitter</sub>
- Variation of the pulse width
  - For level-sensitive clocking

#### **Clock Skew**

#### **Distribution of clock tree insertion delay**



#### **Sources of Skew and Jitter**



#### **Positive and Negative Skew**



Negative: late – early case



### **Signal Routing for Positive and Negative Skew**

#### **Positive skew:** *Clk* and *Data* routed in the same direction



Negative skew: Clk and Data routed in opposite directions



#### **Skew + Jitter = Clock Uncertainty**

#### Single parameter used in CAD tools

## **Clock uncertainty, t**<sub>CU</sub>

## Impact of t<sub>CU</sub> on Timing: Cycle Time

#### Cycle time (long path): late – early analysis



## Impact of t<sub>cu</sub> on Timing: Race Margin

Race immunity (short path): early – late analysis



# Time Borrowing

#### **Time Borrowing: Classification**

#### • **Dynamic:** scheduling data to arrive to transparent CSE

- No "hard" boundaries between stages
- In latch-based level sensitive or soft-edge clocking

#### • Static: control delay between clock inputs

- Clocks scheduled to arrive so that the slower paths obtain more time to evaluate, taking away the time from faster paths
- It can operate with conventional hard-edge CSEs
- Also called opportunistic skew scheduling

## **Dynamic Time Borrowing**

• Latch-based designs: clock pulse-width can be borrowed if the next stage can pay it back (with faster logic)



#### **Cycle Time Analysis**

#### Logic delay can be extended by W



#### **Race Margin Analysis: No Time Borrowed**



#### **Race Margin Analysis: Borrowed Time**



#### **Race Margin Analysis: Summary**

No time borrowed (more likely critical case)

$$t_{Hold} + W < t_{Clk-Q,cd} + t_{Logic,cd}$$

Borrowed time,  $t_{\rm B}$ 

$$t_{\text{Hold}} + W < t_{\text{B}} + t_{\text{D-Q,cd}} + t_{\text{Logic,cd}}$$

 $Min\{t_{B} + t_{D-Q,cd}, t_{Clk-Q,cd}\}$  critical

# Two-Phase Clocking

#### **Two-Phase Clocking**

Freedom to control two phases and pulse-width



#### **Two-Phase Clocking: Cycle Time**



#### Two-Phase Clocking: Min W



#### Two-Phase Clocking: Max W



#### Two-Phase Clocking: Max t<sub>cu</sub>

#### W guards against t<sub>CU</sub>

$$t_{\rm CU} < W_{\rm max} - W_{\rm min}$$

• Assuming 
$$t_{CU} = 2|t_{Skew}|$$

Both polarities of t<sub>Skew</sub>

$$|t_{\text{Skew}}| < (W_{\text{max}} - W_{\text{min}})/2$$

#### Summary: 2-**O** Clocking



- $Max(t_{CL1/2}) < t_1 + t_{2/4} + t_3 t_{Setup} t_{Clk-Q} t_{CU}$
- Strictly,  $Max(t_{CL1} + t_{CL2}) < T_{Clk} 2t_{D-Q}$
- $Min(t_{CL1/2}) > t_{1/3} + t_{Hold} t_{Clk-Q} + t_{CU}$
- More clocking overhead (2 clocks) and low  $f_{max}$

#### **Clocking Methodology: Pulse-Mode**



Examples





## Do We Need Clocks?

### **Minimum Clock Cycle Time Revisited**

- Cycle time determined by the delay through logic
  - It must arrive before the latching edge
  - If too late, it waits until the next cycle
    - Synchronization and sequential order is off

## **Timing requirement**

T<sub>Cycle</sub> > t<sub>Logic</sub> + t<sub>Overhead</sub>

## Do we really need clocks?

## **Constant Propagation Delay?**

- If the propagation delay of CL is constant (regardless of data input) <u>and</u> is known, we don't really need clocks
  - It eliminates the t<sub>Overhead</sub>
  - The inherent  $T_{Cycle}$  of a state-machine will be the delay
  - It can actually be even faster for data flow



#### **Wave Pipelining**

- As the data is propagating down the logic chain ("wave"), a new "wave" can enter
  - Provided the delay is constant
  - As we know, the delay is not constant



## Variable Propagation Delay (1/2)

- Delay through a combinational logic depends on
  - Transition (rise/fall), type of logic, the input position...



#### **Variable** Propagation Delay (2/2)



- For wave pipelining, the spread limits the "cycle time"
- Asynchronous uses the signals to indicate "completion," so the cycle time varies

## Improving Throughput: Pipelining

- In a clocked system, just like wave pipelining but use clocks to remove the delay uncertainty
  - This allows 2 "waves" of data to be present inside CL





#### • Extend this ad infinitum?

- Overhead eventually limits pipelining
- Logic granularity limits the resolution

## Examples

## Example 10.1: 1- $\Phi$ Clock, Min $T_{Cycle} = ?$

- Assume:  $t_{D-Q} = t_{Setup} + t_{Clk-Q} = 0.5ns$ ,  $t_{Hold} = 0.2ns$
- The delays indicate (min, max) for different paths



 $T_{cycle} \ge 8n + 6n + 2*0.5n = 15n \text{ (over 2 cycles)}$   $T_{cycle} \ge 7.5ns$ 

#### Example 10.2a: 1- $\Phi$ Clock, Min W = ?



Max delay

$$T_{\text{Clk}} + W > t_{\text{Logic,max}} + t_{\text{D-Q}}$$
 from  
10.20

W<sub>min</sub> must handle time borrowing
 W > 8.0n (max delay) + 0.5n - 7.5n = 1n

#### **Example 10.2b:** 1-Φ Clock, Max *W* = ?



Min delay

$$t_{\text{Hold}} + W < t_{\text{B}} + t_{\text{Clk-Q,cd}} + t_{\text{Logic,cd}}$$
 from 10.22

#### **W**<sub>max</sub> must satisfy hold time

• W < 1n (min delay) + 0.5n – 0.2n = 1.3n

Note: Latch<sub>A</sub> is always borrowing t<sub>B</sub> = 1ns!

- W < 1n + 1n (min delay) + 0.5n 0.2 = 2.3n

#### Example 10.3: 2- $\Phi$ Clock, Min $T_{Cycle} = ?$

• Assume: 
$$t_{D-Q} = t_{Setup} + t_{Clk-Q} = 0.2ns$$



- Top loop = 2.2 + 2.4 + 0.2\*2 = 5ns
- **Bottom loop** = 1.8 + 2.2 + 0.2\*2 = 4.4ns
- Cross loop (2 cycles) = 1.8 + 2.4 + 0.2\*2
  - + 2.2 + 3.2 + 0.2\*2 = 10.4ns

 $\ \bullet \ \ T_{Cycle} \geq 5.2 ns$ 

(1<sup>st</sup> cycle done early to give time to 2<sup>nd</sup> cycle)

#### Example 10.4: 2- $\Phi$ Clock, Min W = ?

#### **Assume:** $t_{D-Q} = 0.2$ ns, $T_{Cycle} = 5.2$ ns, Pos Clk edge by $T_{Cycle}/2$



Latch  $f_{1b}$  must latch in the data from  $f_{2a}$ • W > 3.2ns + 0.2ns – 2.6ns = 0.8ns

#### **Example 10.5:** 2- $\Phi$ Clock, Max $t_{cu} = ?$

#### **Assume:** $t_{D-Q} = 0.2$ ns, $T_{Cycle} = 5.2$ ns, Pos Clk edge by $T_{Cycle}/2$



• W > 0.8ns

If  $W = T_{Cycle}/4 = 1.3$ ns, what's the max  $t_{CU}$ ? •  $t_{CU} < 1.3$ ns – 0.8ns = 0.5ns

#### **Timing: Summary**

- Most systems today are synchronous
  - Many systems now have some degree of asynchrony through using multiple phases or self-timing
- Clocking is critical in guaranteeing functionality of a synchronous system to meet performance
  - Delay can be too long so that data is not latched
  - Delay can be too short to cause data to race through
  - Clock has skew and can cause errors in timing

#### **Clocking Methodologies: Summary**

- Three different clocking methodologies
  - 2-phase, edge-triggered, pulse-mode
- Each has their criteria on pulse width (duty cycle) and cycle time
  - By using skew or pulse width appropriately, we can allow delays to exceed the cycle time through time borrowing