# Designing 3D Test Wrappers for Pre-bond and Post-bond Test of 3D Embedded Cores

Dean L. Lewis, Shreepad Panth, Xin Zhao, Sung Kyu Lim, Hsien-Hsin S. Lee

Georgia Institute of Technology

October 10, 2011

### Outline

- Introduction
- Problem Description
- Oesign Algorithm
- 4 Experiments
- Conclusion

## Outline

- Introduction
  - 3D Integration
  - Modular Test
  - 3D Test
- Problem Description
- Oesign Algorithm
- 4 Experiments
- Conclusion



- New integration technology
- Multiple active silicon tiers stacked vertically
- Short, fast vertical interconnects
  - Microbumps
  - Through-silicon vias (TSVs)







#### **Benefits**

- More silicon
- Heterogeneous integration
- Reduced interconnection length
- Increased bandwidth
- Increased routing freedom

## 3D ALU [Puttaswamy, ISCAS'06]





- Mod-2 bit partitioning of a Kogge Stone adder
- Significantly reduced wire length
- 3.7% (18%) reduction in latency
- 2.6% (13%) reduction in power consumption

# 3D Register File [Puttaswamy, ISVLSI'06]

- Port-split register file
- Shortens all wires in the design
- 36% reduction in latency
- 58% reduction in power consumption





- Modular test structures
- Isolate an embedded core during test
- Manages IP between companies
- IEEE Standards 1149.1 and 1500











# Test Architecture with Wrappers



## Test Architecture with Wrappers



## Black Box



## Black Box



## 3D Black Box



## 3D Black Box



## 3D Black Box













### Outline

- Introduction
- Problem Description
  - Formal Description
  - Motivating Example
  - Complexity
- 3 Design Algorithm
- 4 Experiments
- Conclusion

#### Problem Statement

- Given
  - A test description of a 3D embedded core
    - number of I/Os
    - number of scan chains
    - the lengths of the scan chains
    - a 3D partition of these elements
  - A set of prebond TAM bus widths
  - A postbond TAM bus width

#### Problem Statement

- Given
  - A test description of a 3D embedded core
    - number of I/Os
    - number of scan chains
    - the lengths of the scan chains
    - a 3D partition of these elements
  - A set of prebond TAM bus widths
  - A postbond TAM bus width
- Find
  - An assignment of all scan chains and I/Os to both prebond and postbond wrapper chains

#### Problem Statement

- Given
  - A test description of a 3D embedded core
    - number of I/Os
    - number of scan chains
    - the lengths of the scan chains
    - a 3D partition of these elements
  - A set of prebond TAM bus widths
  - A postbond TAM bus width
- Find
  - An assignment of all scan chains and I/Os to both prebond and postbond wrapper chains
- Optimizing for
  - Minimum total test time
  - Minimum total wire length, subject to test time



#### **Total Test Time**

$$T = (p+1) \times (s+1)$$

T: total test time for the embedded core

p: number of test patterns to apply

s: length of the longest wrapper chain



#### **Total Test Time**

$$T = (p+1) \times (s+1)$$

T: total test time for the embedded corep: number of test patterns to applys: length of the longest wrapper chain

Minimizing total test time is then equivalent to minimizing the length of the longest wrapper chain

# Motivating Codesign of 3D Wrappers



# Motivating Codesign of 3D Wrappers



Postbond TAM width = 2

# Motivating Codesign of 3D Wrappers



Postbond TAM width = 3

# Complexity— $\mathcal{NP}$ -Hard

The wrapper design problem was shown to be  $\mathcal{NP}$ -hard in [Iyengar, JETTA'02].



#### Outline

- Introduction
- Problem Description
- 3 Design Algorithm
  - Overview
  - BFD
  - KL
  - Pairing
- 4 Experiments
- Conclusion



### Heuristic Algorithm

- Three-step heuristic algorithm
  - Best fit decreasing (BFD)
  - Kernighan-Lin partitioning (KL)
  - Pairing

**BFD** 

- High-quality, speedy  $\mathcal{O}(n)$  packing algorithm
- Packs scan elements into wrapper chains
- Can avoid wasting unnecessary wrapper chains
- In decreasing length order, scan elements are assigned to the wrapper chain in which they fit best
- Input—set of scan elements and TAM width
- Output—set of wrapper chain assignments















**BFD** 



- $\mathcal{O}(Kn^3)$  runtime
- Divide scan element set and TAM bits into two partitions
- Move scan elements between partitions, calculating test time and stitch reuse
- Does accept bad moves
- Recurse until each partition has only a single TAM bit
- Input—wrapper chain assignment from BFD
- Output—complimentary wrapper chain assignment

#### INPUT



#### **INPUT**



#### **GRAPH REPRESENTATION**



#### **GRAPH REPRESENTATION**



#### **KL ITERATION 1**

$$K_2=2$$
 5 |  $K_1=1$  1 2 2 1 1 2

#### **KL ITERATION 1**



#### **KL ITERATION 2**



#### **KL ITERATION 2**



#### **ASSIGNMENT**



#### **ASSIGNMENT**



#### **OUTPUT**



#### Scan Element Pairing

- Simple  $\mathcal{O}(n)$  algorithm for reusing stitches
- Scan each wrapper chain, checking for each element's neighbor
- If found, pair element with neighbor
- Input—wrapper assignment from KL
- Output—compressed wrapper assignment

#### Outline

- Experiments
  - Implementation
  - Setup
  - Metrics
  - Metrics
  - Results



### **Evaluation Methodology**

- Implemented algorithm in C++
- Benchmarks taken from OpenCores database
- Two- and four-tier partitions of each benchmark
- Three experiments
- Three configurations
- Sweep across a range of TAM widths

#### **Benchmarks**

|      | Two Tiers                      |                 |
|------|--------------------------------|-----------------|
|      | Cells per Tier                 | Chains per Tier |
| ckt1 | 3016, 3021                     | 6, 6            |
| ckt2 | 5329, 3479                     | 11, 7           |
| ckt3 | 19,890, 19,228                 | 40, 39          |
| ckt4 | 37,359, 40,751                 | 75, 82          |
|      | Four Tiers                     |                 |
| ckt1 | 1507, 1512, 1510, 1508         | 3, 3, 3, 3      |
| ckt2 | 2543, 1980, 2767, 1518         | 5, 4, 6, 3      |
| ckt3 | 9826, 9172, 10,757, 9363       | 20, 18, 22, 19  |
| ckt4 | 20,723, 18,135, 17,011, 22,241 | 41, 36, 34, 44  |

- Experiments
  - BFD all BFD
    - BFD: pre-bond and post-bond (baseline)

- Experiments
  - BFD all BFD
    - BFD: pre-bond and post-bond (baseline)
  - PRE pre-bond first
    - BFD: pre-bond -¿ KL: post-bond

- Experiments
  - BFD all BFD
    - BFD: pre-bond and post-bond (baseline)
  - PRE pre-bond first
    - BFD: pre-bond -¿ KL: post-bond
  - POST post-bond first
    - BFD: post-bond -¿ KL: pre-bond

- Experiments
  - BFD all BFD
    - BFD: pre-bond and post-bond (baseline)
  - PRE pre-bond first
    - BFD: pre-bond -¿ KL: post-bond
  - POST post-bond first
    - BFD: post-bond −¿ KL: pre-bond

- Configurations
  - 05 half width
    - post-bond TAM width is twice as wide as the pre-bond width

- Experiments
  - BFD all BFD
    - BFD: pre-bond and post-bond (baseline)
  - PRE pre-bond first
    - BFD: pre-bond -¿ KL: post-bond
  - POST post-bond first
    - BFD: post-bond −¿ KL: pre-bond

- Configurations
  - 05 half width
    - post-bond TAM width is twice as wide as the pre-bond width
  - 10 even width
    - pre-bond and post-bond widths are equal

- Experiments
  - BFD all BFD
    - BFD: pre-bond and post-bond (baseline)
  - PRE pre-bond first
    - BFD: pre-bond -¿ KL: post-bond
  - POST post-bond first
    - BFD: post-bond −¿ KL: pre-bond

- Configurations
  - 05 half width
    - post-bond TAM width is twice as wide as the pre-bond width
  - 10 even width
    - pre-bond and post-bond widths are equal
  - 20 double width
    - pre-bond TAM width is twice as wide as the post-bond width



#### Metrics

- Critical test length (CTL)
  - the sum of the length of the longest wrapper chain in each test wrapper
  - correlates to the total test time
  - lower CTL is better



















# Critical Test Length







# Critical Test Length







- Critical test length (CTL)
  - the sum of the length of the longest wrapper chain in each test wrapper
  - correlates to the total test time
  - lower CTL is better

- Critical test length (CTL)
  - the sum of the length of the longest wrapper chain in each test wrapper
  - correlates to the total test time
  - lower CTL is better
- Cut
  - the number of stitching wires from the BFD solution that are not reused in the KL solution
  - correlates to the wirelength
  - lower cut is better

#### CTL Results



## CTL Results

|      | Average | Max  |  |
|------|---------|------|--|
| PRE  | 0.06%   | 4.2% |  |
| POST | 0.32%   | 3.0% |  |

#### Cut Results



#### Cut Results



## Cut Results

|      | Tiers | ckt1 | ckt2 | ckt3 | ckt4 | ALL  |
|------|-------|------|------|------|------|------|
| BFD  | 2     | 52%  | 15%  | 23%  | 16%  | 27%  |
|      | 4     | 63%  | 53%  | 35%  | 31%  | 21/0 |
| PRE  | 2     | 12%  | 5.8% | 5.0% | 6.7% | 6.6% |
|      | 4     | 15%  | 7.6% | 5.0% | 7.4% |      |
| POST | 2     | 13%  | 4.0% | 7.6% | 8.8% | 8.4% |
|      | 4     | 16%  | 6.1% | 7.3% | 11%  |      |











## Outline

- Introduction
- Problem Description
- Oesign Algorithm
- 4 Experiments
- Conclusion

 3D cores will be needed to take full advantage of 3D integration technology

- 3D cores will be needed to take full advantage of 3D integration technology
- 3D test wrappers are needed fully test 3D core pre-bond and post-bond

- 3D cores will be needed to take full advantage of 3D integration technology
- 3D test wrappers are needed fully test 3D core pre-bond and post-bond
- An optimization opportunity exists to share routing resources between pre-bond and post-bond wrappers

- 3D cores will be needed to take full advantage of 3D integration technology
- 3D test wrappers are needed fully test 3D core pre-bond and post-bond
- An optimization opportunity exists to share routing resources between pre-bond and post-bond wrappers
- Our heuristic designs near-optimal wrappers

- 3D cores will be needed to take full advantage of 3D integration technology
- 3D test wrappers are needed fully test 3D core pre-bond and post-bond
- An optimization opportunity exists to share routing resources between pre-bond and post-bond wrappers
- Our heuristic designs near-optimal wrappers
- Generally the PRE configuration is superior because of the greater search space it allows KL

#### References I



PUTTASWAMY, K. and LOH, G. H.

"The Impact of 3-Dimensional Integration on the Design of Arithmetic Units."

In Proceedings of the International Symposium on Circuits and Systems, 2006.



PUTTASWAMY, K. and LOH, G. H.

"Implementing register files for high-performance microprocessors in a die-stacked (3D) technology."

In Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures, pp. 384, 2006.



IYENGAR, V., CHAKRABARTY, K. and MARINISSEN, E. J.

"Test wrapper and test access mechanism co-optimization for system-on-chip."

In Journal of Electronic Testing: Theory and Applications, vol. 18(2), pp. 213–230, 2002.

Thank you