

University of California, Los Angeles Henry Samueli School of Engineering and Applied Science Department of Electrical Engineering

ECE 216B | D. Marković

The exam is closed-book. You are allowed one page of notes.

# ECE 216B: SPRING 2024—MIDTERM

## Tuesday, May 7, 10am-11:50am

MSOL: times/dates vary

| NAME | SOLUTION |       |  |
|------|----------|-------|--|
|      | Last     | First |  |
| SID  |          |       |  |

Please write answers in the box provided, answers elsewhere will not be graded. If you need extra space, draw a box around your "final" answer for each question.



| Problem 1 | /6  |
|-----------|-----|
| Problem 2 | /16 |
| Problem 3 | /18 |
| Problem 4 | /12 |

| Total (52) |  |
|------------|--|
|------------|--|

#### **PROBLEM 1: Dark Silicon (6 pts)**

A 7nm processor chip operates at 0.8V. Under power-limited scaling, maximum chip utilization in the 7nm technology is 20%. The processor is scaled to a 5nm technology and it operates at 0.75V, under the same power budget as the 7nm counterpart. Additionally, scaling allows new features to be added, while keeping the chip area the same. In other words, assume that the chip area and the total power budget are the same across the two technologies. Estimate maximum chip utilization in the 5nm technology.

Power is limited and general scaling model applies:

Geometry scaling: S = 7/5 = 1.4

Voltage scaling: U = 0.8/0.75 = 1.067

For a fixed chip area (scaling is economic trend and more quantity is expected),

Chip utilization drops by:  $S^2/U^2 = 1.72$ 

Chip utilization in  $5nm = 20\% * U^2/S^2 = 11.63\%$ 

#### **PROBLEM 2: Multiplier (16 pts)**

An array multiplier consists of rows or adders, each producing partial sums that are subsequently fed to the next adder row. We consider the effect of pipelining such a multiplier by inserting registers.



a) Indicate critical path of the non-pipelined multiplier on the diagram above. Redraw the standard array multiplier (on the next page) by inserting word-level pipeline registers as to maximize throughput for a 4x4 multiplier. Assume that  $X_{0-3}$  and  $Y_{0-3}$  come from registers. (8 pts)

critical path is marked on the schematic above



b) What is the critical path and latency for the non-pipelined and pipelined implementation? (8 pts)

Non-pipelined:  $t_{crit-path} = [(M-1)+(N-2)] t_{carry} + (N-1)t_{sum} + t_{and}$ 

M = N = 4 Simplified (t<sub>carry</sub> = t<sub>sum</sub> = t<sub>add</sub>): t<sub>crit-path</sub> = 8 t<sub>add</sub> + t<sub>and</sub>

Pipelined (see table and schematic above): Horizontal (H), Vertical (V), Horizontal + Vertical (H + V)

| Design        | Non-pipelined                                                                                                        | Pipelined                                                                                                                                                                                                                              |
|---------------|----------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Critical path | 5 t <sub>carry</sub> + 3 t <sub>sum</sub> + t <sub>and</sub><br>(Simplified: 8 t <sub>add</sub> + t <sub>and</sub> ) | H: 3 $t_{carry} + t_{sum} + t_{and}$<br>(Simplified: 4 $t_{add} + t_{and}$ )<br>V: 3 $t_{sum} + t_{and}$<br>(Simplified: 3 $t_{add} + t_{and}$ )<br>H+V: max{ $t_{carry}, t_{sum}$ } + $t_{and}$<br>(Simplified: $t_{add} + t_{and}$ ) |
| Latency       | 0                                                                                                                    | H: 2<br>V: 5<br>H+V: 7                                                                                                                                                                                                                 |

#### **PROBLEM 3: Number Representation (18 pts)**

Add  $v_1$ ,  $v_2$ , and  $v_3$  in two steps: 1)  $v = v_1 + v_2$ , 2)  $s = v + v_3$ . Assume  $v_1 = 0.6875$ ,  $v_2 = 0.8125$ , and  $v_3 = -0.5625$ . Use **5-bit** 2's complement representation of  $v_1$ ,  $v_2$ ,  $v_3$  and final sum *s* (intermediate result *v* could have more bits, if needed). You may need to sign-extend  $v_3$  in Step 2, if *v* dictates so.

### a) Step 1 (3 pts)**0.1011** $(=2^{-1} + 2^{-3} + 2^{-4})$ $v_1$ in 2's complement $\rightarrow$ $v_2$ in 2's complement $\rightarrow$ **0.1101** $(=2^{-1} + 2^{-2} + 2^{-4})$ $v = v_1 + v_2 \rightarrow$ **01.1000** (sign since $C_{in}(MSB) = 1 \neq C_{out}(MSB) = 0$ b) **Step 2** (3 pts)**11.0111** (magnitude = $2^{-1} + 2^{-4} \rightarrow 2$ 's comp. = 1.0111, sign extend) $v_3$ in 2's complement $\rightarrow$ Copy *v* from **Step 1** here $\rightarrow$ 01.1000 $s = v + v_3 \rightarrow$ 100.1111 (discard this sign bit; C<sub>in</sub>(MSB) = C<sub>out</sub>(MSB) = 1, so no overflow, fits 5-bit format)

c) Express the following 2's complement numbers in decimal (base-10) notation. (2 pts)



**6.8125** (= 2<sup>2</sup> + 2<sup>1</sup> + 2<sup>-1</sup> + 2<sup>-2</sup> + 2<sup>-4</sup>) 1011.0011

**-4.8125** (sign bit = 1, magnitude = 4.8125)

d) Truncate decimal format in (c) to base-10 integers, and binary format to base-2 integers. (4 pts)

Decimal  $\rightarrow$ 

6 –4

|                    | _                                 |
|--------------------|-----------------------------------|
|                    |                                   |
| 0110               | 1011                              |
| -(c)               | -( E)                             |
| =(0) <sub>10</sub> | =(-5)10                           |
|                    | <b>0110</b><br>=(6) <sub>10</sub> |

e) Round decimal format in (c) to base-10 integers, and binary format to base-2 integers. (4 pts)

Decimal  $\rightarrow$ 



| Binary → | <b>0111</b><br>=(7) <sub>10</sub> | <b>1011</b><br>=(-5) <sub>10</sub> |
|----------|-----------------------------------|------------------------------------|
|          |                                   |                                    |

f) Apply sign extension to the binary format in (c) in base-2. (2 pts)

Answer box  $\rightarrow$ 

00110.1101 11011.0011 (extended sign shown in red)

#### **PROBLEM 4: Filter Architectures (12 pts)**

Consider a 4-tap FIR filter shown below.



a) (3 pts) What is the maximum throughput  $(1/t_{critical-path})$  of this architecture? Assume  $t_{adder} = 200$ ps,  $t_{multiplier} = 800$ ps, and registered x[n] and y[n].

Critical path =  $t_{multiplier} + 3t_{adder} = 1.4ns$ 

Throughput  $\leq$  714.3 MS/s

Max Throughput = 714.3 MS/s

b) (9 pts) Suggest an architecture that maximizes throughput. You can use any architectural technique that you see fit (direct, transposed, pipelined), but you can add no more than 4 pipeline registers in total. Slicing through combinational logic inside the adders and multipliers is allowed (i.e. low-level pipelining). What is the maximum achievable throughput? Sketch the modified architecture.



- → Three pipeline registers are added to balance logic delay
- $\rightarrow$  Low-level pipelining within multipliers is applied to match t<sub>1</sub> and t<sub>2</sub> logic delays

