





# Testing Circuit-Partitioned 3D IC Designs

Dean L. Lewis, Hsien-Hsin S. Lee IEEE Computer Society Annual Symposium on VLSI Tampa, Fl, 2009 http://arch.ece.gatech.edu/mars.html





Multiple layers of silicon







- Multiple layers of silicon
- Interconnected with TSVs
  - Etched through thinned wafers





- Multiple layers of silicon
- Interconnected with TSVs
  - Etched through thinned wafers
- Several integration options
  - Technology
    - Different processing technologies are tightly integrated





- Multiple layers of silicon
- Interconnected with TSVs
  - Etched through thinned wafers
- Several integration options
  - Technology
    - Different processing technologies are tightly integrated
  - Architecture
    - Blocks split across layers





- Multiple layers of silicon
- Interconnected with TSVs
  - Etched through thinned wafers
- Several integration options
  - Technology
    - Different processing technologies are tightly integrated
  - Architecture
    - Blocks split across layers
  - Circuit
    - Transistors split across layers





#### Motivation





#### Motivation





#### Motivation





#### Previous Work



\* Dean L. Lewis and Hsien-Hsin S. Lee. A Scan-Island Based Design Enabling Pre-bond Testability in Die-Stacked Microprocessors. In Proceedings of the IEEE International Test Conference 2007 (ITC), Santa Clara, CA, October, 2007.





#### Previous Works – Results





# Kogge-Stone Adder

- Binary summation tree
  - P and G signals represented by arrows
- Trades off hardware for reduced fan-out



Georgia

12



# Kogge-Stone Adder

- Binary summation tree
  - P and G signals represented by arrows
- Trades off hardware for reduced fan-out
- In second stage, there are two disjoint sets of logic





# Kogge-Stone Adder

- Binary summation tree
  - P and G signals represented by arrows
- Trades off hardware for reduced fan-out
- In second stage, there are two disjoint sets of logic
  - These sets do not interact, yet they compete for wiring tracts
- Four sets at third level





- Bit-splitting separates disjoint logic sets across layers
  - For two layers, we get an even layer and an odd layer



Georgia



- Bit-splitting separates disjoint logic sets across layers
  - For two layers, we get an even layer and an odd layer
- TSVs shuffle P&G signals in first level of logic





- Bit-splitting separates disjoint logic sets across layers
  - For two layers, we get an even layer and an odd layer
- TSVs shuffle P&G signals in first level of logic
- Now the second and third stages are much less congested





 But now first stage is much more complex





- But now first stage is much more complex
- Bit-splitting can be repeated
  - TSVs in first two levels
  - One-fourth the complexity in all other levels





- But now first stage is much more complex
- Bit-splitting can be repeated
  - TSVs in first two levels
  - One-fourth the complexity in all other levels
- Trading off TSVs for reduced complexity





# Testing the Adder

- Few TSVs located near the edge of the circuit
  - Scan-based test acceptable





# Testing the Adder

- Few TSVs located near the edge of the circuit
  - Scan-based test acceptable
- Add a scan-cell to each TSV
  - Two per adder column
- No observation cells required
  - Values generated in level one logic observable at adder POs





# Many-Port Register File

- Many ports to allow parallel access to many entries
  - 20 or more in recent out-oforder processors







# Many-Port Register File

- Many ports to allow parallel access to many entries
  - 20 or more in recent out-oforder processors
- Wiring in cell grows quadratically with port count
  - Required to make room for extra word- and bit-lines





# Many-Port Register File

- Many ports to allow parallel access to many entries
  - 20 or more in recent out-oforder processors
- Wiring in cell grows quadratically with port count
  - Required to make room for extra word- and bit-lines
- This increases
  - Cell size
  - Word-line length
  - Bit-line length
- All of these slow circuit down







# Port-Split Register File

- To fight quadratic growth, we split ports across layers
- This reduces
  - Cell size
  - Word-line length
  - Bit-line length
- A very big win for 3D
- But how do we test the top layer pre-bond







- Suk and Reddy's Test B
  - Write '0' to all cells







- Suk and Reddy's Test B
  - Write '0' to all cells
  - Write '1' to a particular cell





- Suk and Reddy's Test B
  - Write '0' to all cells
  - Write '1' to a particular cell
  - Read written cell and neighbors







- Suk and Reddy's Test B
  - Write '0' to all cells
  - Write '1' to a particular cell
  - Read written cell and neighbors
- This algorithm tests not only each cell's functionality but also bridging between neighboring cells
- Standard neighborhood
  - Four adjacent cells





We can't write to cells







- We can't write to cells
- But we can write through them







- We can't write to cells
- But we can write through them
- Transmit test
  - Write







- We can't write to cells
- But we can write through them
- Transmit test
  - Write and read simultaneously
- Requires at least one write and one read port per layer





# Experimental Setup

- Design
  - 3D layouts using the 3D Magic tool
  - DRC rules from MITLL 180nm process
  - Simulation with HSPICE
  - Lvl 49 transistor model

- Adder Test
  - Verilog models
    - Separate bottom, top, and integrated models for 3D adder
  - FlexTest for test modeling
- RF Test
  - Algorithm





#### 64-Bit Planar KS Adder



Georgia Tech



#### 64-Bit 3D KS Adder







# Kogge-Stone Comparison

|                              | 2D Adder | <b>3D Adder</b> | %   |
|------------------------------|----------|-----------------|-----|
| Area (µm²)                   | 35.4k    | 23.5k           | 66% |
| Footprint (µm <sup>2</sup> ) | 35.4k    | 11.8k           | 33% |
| Delay (ns)                   | 7.46     | 6.08            | 82% |
| Power (mW)                   | 26.1     | 22.6            | 87% |

| Design   |        | Pattern Count |  |
|----------|--------|---------------|--|
| 2D Adder |        | 313           |  |
| 3D Adder | Тор    | 146           |  |
|          | Bottom | 145           |  |
|          | Vias   | 10            |  |
|          | Total  | 301           |  |





#### 128-bit, 6-port Planar RF







#### 128-bit, 6-port 3D RF





# Register File Comparison

|                              |           | 2D RF | 3D RF | %   |
|------------------------------|-----------|-------|-------|-----|
| Area (µm²)                   |           | 20.3k | 12.5k | 61% |
| Footprint (µm <sup>2</sup> ) |           | 20.3k | 6.24k | 31% |
| Delay (ps)                   | Read '0'  | 1401  | 1043  | 74% |
|                              | Read '1'  | 1407  | 1050  | 75% |
|                              | Write '0' | 520   | 308   | 59% |
|                              | Write '1' | 1381  | 735   | 53% |
| Energy (pJ)                  | Read '0'  | 0.149 | 0.126 | 85% |
|                              | Read '1'  | 0.149 | 0.127 | 85% |
|                              | Write '0' | 2.342 | 1.704 | 73% |
|                              | Write '1' | 2.342 | 1.710 | 73% |





# Register File Comparison

| Design |        | Pattern |
|--------|--------|---------|
| 2D RF  |        | 8192    |
| 3D RF  | Тор    | 256     |
|        | Bottom | 4096    |
|        | Vias   | 512     |
|        | Total  | 4864    |

| Test Access |              |       |
|-------------|--------------|-------|
| Delay (ps)  | Transmit '0' | 1346  |
|             | Transmit '1' | 1744  |
| Energy (pJ) | Transmit '0' | 0.189 |
|             | Transmit '1' | 0.139 |





#### Conclusion

- 3D pre-bond circuit test can be done
- It can be done using straight-forward extensions to planar scanbased test
- Even circuit-partitioned designs can, in most cases, be tested with scan-chain test
- Some designs will require new test algorithms
- Complete 3D test is similar in cost to, and sometimes significantly less, planar test





# Thank you!

#### http://arch.ece.gatech.edu/mars.html

