Nội dung chínhChapter 1 NANO-CMOS SCALING PROBLEMS AND IMPLICATIONS
Chapter 2 CMOS DEVICE AND PROCESS TECHNOLOGY
Chapter 3 THEORY AND PRACTICALITIES OF SUBWAVELENGTH OPTICAL LITHOGRAPHY
Chapter 4 MIXED-SIGNAL CIRCUIT DESIGN
Chapter 5 ELECTROSTATIC DISCHARGE PROTECTION DESIGN
Chapter 6 INPUT/OUTPUT DESIGN
Chapter 7 DRAM
Chapter 8 SIGNAL INTEGRITY PROBLEMS IN ON-CHIP INTERCONNECTS
Chapter 9 ULTRALOW POWER CIRCUIT DESIGN
Chapter 10 DESIGN FOR MANUFACTURABILITY
Chapter 11 DESIGN FOR VARIABILITY
409 trang |
Chia sẻ: banmai | Lượt xem: 2319 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Nano CMOS Circuit and Physical Design, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
4.5m
4m
3.5m
2.5m
1.5m
500u
3m
2m
1m
0
0 200m 400m 600m 800m 1 1.2
Figure 11.21 Temperature effect on Ids of sub-100-nm transistors.
0.0000
0.0100
0.0200
0.0300
0.0400
0.0500
0.0600
0.0700
0.10 1.00 10.00 100.00
V t
h
lin
−
V t
h
sa
t (V
)
Gate length (µm)
Figure 11.22 Drain-induced threshold voltage shift.
Package and system modeling of the supply impedance is now very important,
especially for high-performance chip designs. L(di/dt) is the most significant
voltage drop in such designs, and the supply impedance must be designed to
keep the L(di/dt) plus IR drop to within the design budget, usually 10% of
the supply voltage for high-performance processors. This would require on-chip
decoupling capacitors, package capacitors, and capacitors on the system board.
At a minimum, the decoupling capacitors on the chip should be 10 times the
STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 369
Scattered
ions
Ion implant
Photo
resist
STI
n-well p-well
(a)
(b)
Ion implants
Poly
gates
Figure 11.23 (a) Well proximity effects. (b) Poly proximity effects.
equivalent switching capacitance (Ceqv) of the chip, where
Ceqv = chip power
Vdd
2 × frequency
The chip power is determined using the worse-case vector, which means that
the Ceqv value will be larger and can better maintain the power to the chip
during the peak power demand period. Hence, performance degradation during
the period of peak power demand due to supply voltage droop will be kept to a
minimum due to the well-decoupled supply. Package and system board modeling
is a very important part of the design in order to meet the supply impedance goal
of a high-performance system and is beyond the scope of this book.
In mixed-signal designs in an epitaxial process, sharing Vss but separating the
Vdd supply will result in the lowest noise coupling from the digital domain [12].
This also provides an opportunity for using a higher voltage power supply than
in digital circuits, to provide the necessary headroom for some designs. Sharing
370 DESIGN FOR VARIABILITY
Vss simplifies the electrostatic discharge (ESD) protection as well (see Chapter 5
for a full discussion on ESD protection in the nano-CMOS regime).
11.2.4 Digital Circuit Strategies to Deal with Variations
Digital circuits are usually more tolerant to process variation; however, some
digital circuits, including self-timed circuits and matched delay circuits, can be
extremely sensitive to process variation. Self-timing is used primarily in embed-
ded memories such as cache memories. It was used most commonly during the
period when clock frequency was low. To reduce the access time of memories,
self-timing techniques were used to generate edges to clock the sense amplifiers
(SAs), so that memory data were available earlier in the clock cycle. This enabled
one-cycle access, including logical operation on the memory data, for better per-
formance. As clock frequency scales, the access time of the embedded SRAM
has come within the clock cycle time, so a lot more edges have become avail-
able to clock the SAs. Therefore, there is now less compulsion for self-timing
to generate edges. The only other need for self-timing is to save power in cases
where the SAs are not clocked until an address changes, while the clocked design
requires clock gating to reduce clock power. It is still a lot easier and more robust
to gate the SA clock than to self-time it. Many schemes are designed to mitigate
the impact of variation on design robustness if one must self-time. We discuss
next a self-timed scheme used in SRAM.
Self-Timing Strategies Traditional self-timed memory relied on a single
SRAM cell to drive the dummy bitline, which is then converted to a full CMOS
level and fanout to drive the SA clock line [28]. As can be seen in Figure 11.24,
a single-cell self-timing scheme is very sensitive to process variation that causes
cell drive variation resulting in higher self-timing delay variation. To avoid
failures due to the higher self-timing path delay variation, more margin is
needed—at the expense of performance.
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20 25
D
el
ay
V
a
ria
tio
n
Number of Cells
Figure 11.24 Effect of number of cells on self-timed delay variation.
STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 371
As shown in Figure 11.24, lower self-timed path delay variation can be
achieved if more than one cell is used [13]. The use of several cells averages
out cell current variations. A multicell self-timed scheme is illustrated in
Figure 11.25. To minimize cell drive variation it is important to have another
column of dummy cells at the edge of the array next to the self-timed dummy
column. In the memory array, metal density is very consistent, making it easier to
control ILD variation. Due to the regularity of the metal lines, resistance variation
due to CMP chemical–mechanical planarization (CMP) is kept to a minimum,
provided that the fabricator optimizes CMP for the metal density as found in
the memory array. Most fabricators understand the need to reduce resistance
variation in a memory array and will optimize the process around the memory
array. Even so, there will still be some resistance variation due to barrier metal
thickness and wire width variation.
Self-Timed Margins Figure 11.26 illustrates a typical race condition in self-
timed designs. In this illustration, delay in Out1 must be less than the delay
in Out2; otherwise, functional failure would result. Due to process, voltage, and
Dummy WL
Bit Bar
Dummy Cell
Dummy Cell
Dummy Cell Array Cell
Array Cell
Bit
Dummy BL
Regular WL
Figure 11.25 Multicell self-timed scheme.
Out1
Out2
Delay1
Delay2
Common
signal point
Figure 11.26 Margining self-timed paths.
372 DESIGN FOR VARIABILITY
temperature (PVT) variations and layout differences, Delay2 may become shorter
than Delay1 on silicon, due the fact that some local effects are not fully or
correctly modeled or anticipated during the design phase. When this happens in
a self-timed design, it will result in a functional failure which will not work at
any frequency, including at very low frequency, and will require design redo to
restore even basic functionality. This is a very serious and costly design failure.
To safeguard against such a situation, we add margins to the simulation model
to cover for the unanticipated effects, so as to reduce the probability of such a
functional failure.
As mentioned earlier, the speed of Delay2 may not match that of Delay1 due
either to some unanticipated effect or if the circuit is not fully optimized. The
following analysis translates the margin into a physically meaningful parameter
that can be used to verify the margin of the self-timed circuit. The self-timed
circuit in Figure 11.26 at the verge of failure can be represented as
Delay2 × (1 − M) = Delay1 × (1 + M) where M is the self-timed margin
Simplifying, we obtain
M × (Delay1 + Delay2) = Delay2 − Delay1
Hence,
M = Delay2 − Delay1
Delay1 + Delay2
Typically, M is set to 0.25 for prelayout and 0.15 for postlayout extracted simula-
tions over all practical corners. The use of statistical models is highly encouraged
for more realistic corner coverage. Further details on statistical modeling are
given in Section 11.3. Regardless of the self-timing margin, every self-timed path
must have metal programmable options to increase the margin to at least 30% in
all practical corners. As mentioned earlier, self-timed race failure is catastrophic
for a chip; the addition of metal programming options can lead to a quick loop
fix. The metal options must be designed to affect a self-timing margin change
in as little as one layer and no more than two layers. This is important, since
mask cost is on the rise, especially for nano-CMOS process nodes. If possible,
design the programming change at as high a metal level as possible to allow
for a quick fabrication turnaround time for the fix in the event that a self-timing
margin change is necessary.
Delay Variation Due to SlowNodes Slow nodes manifest themselves as high-
fanout nodes, long unrepeated lines, and signals through pass gates and cascading
pass gates. Pass gates present themselves as large resistors to the signal, just like
long unrepeated lines. When more then two pass gates (unbuffered) are in a
signal’s path, the result is a really slow node that must be dealt with. Slow nodes
could also be weakly driven nodes, as in the case of signals through cascading
pass gates and long, unrepeated signal lines. The weakly driven nodes are more
STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 373
Trip point
variation
Large delay
variation
Small delay variation
Figure 11.27 Trip point versus delay variation.
susceptible to noise coupling into the far-end node where the receiver resides.
There is another hazard that affects all slow nodes, including high-fanout nodes.
As shown in Figure 11.27, variation in the input trip point of the receiver will
translate into a larger input delay variation due to the gentle slope of the input
signal on the slow nodes. Maintaining an input slew rate enables the design to
better tolerate P-to-N process skew that affects gate input threshold or trip point.
In some circuits, such as an arithmetic block, there will be pass gates in the data
path if pass gate adders are used. In some cases there could be several pass gates
in series in the data path unless the designers add buffers between the cascading
full adders. This adds delay in the critical path. There are ways to mitigate this
by using differential cascode voltage switch (DCVS) logic instead [31][32].
Pulse Flop Clock Generator Design Strategies
Match Trip Points Pulse flop operation and design are not covered in this book;
refer to other circuit design texts for a detailed discussion. Full understanding of
the pulse flop operation is needed to appreciate the following discussion on pro-
cess variation issues that affect the pulse generator and operation of the pulse flop.
Figure 11.28 shows a typical pulse generator for pulse flops. Inv1 through Inv3
Global
Clock
Pulse
Output
Global
Clock
Inv1 Inv2 Inv3
Inv4
Nand1
Figure 11.28 Typical pulse flop pulse generator.
374 DESIGN FOR VARIABILITY
form a delay chain that defines the pulse width of a pulse generator. Pulse gener-
ator pulse width variation has a serious impact on the hold time of a pulse flop.
The input trip point of Nand1 and Inv1 must be matched; otherwise, the pulse
width varies with the global clock edge rate variation. A longer pulse output
width will result in a longer hold-time requirement but offers a longer transpar-
ent time. If the logic cone feeding into the pulse flop is not properly balanced
in timing, the longer transparent period due to the wider pulse width can cause
a hold-time problem even when there is a maximum time path from the same
logic cone.
Let us consider the case where Nand1 has a higher trip point than the input
trip point of the inverter chain, starting with Inv1 in Figure 11.28. As the global
clock rises, Inv1 trips first and starts the delay chain going, while Nand1 has not
quite reacted to the global clock input. This in effect shortens the output pulse
width of the pulse generator because the inverter delay chain times out sooner
with respect to the rising edge of the pulse generator output. The delay after
Inv1 until Nand1 triggers will be the amount of shortening of the pulse generator
pulse width. As can be seen in Figure 11.27, the clock rise time change can alter
this delay, thus changing the pulse width. The clock rise time can change for
several reasons, and the change can affect the hold time of the chip and cause
catastrophic failure. The flip condition where the Nand1 trip point is lower than
Inv1 will increase the pulse width and hold time requirement of the pulse flops.
In cell-based designs where the pulse flops characterization condition assumes
that the trip point of Inv1 and Nand1 are matched, hold time failures can result
if the trip points of Inv1 and Nand1 are not matched, as that changes the actual
hold time requirement of the flops.
Set the input trip point slightly below Vdd /2 (lower middle third) but not too
low; otherwise, ground bounce will be an issue. The reason for this is that the
edge placement error is lower at a point low on the clock rising edge. Since
the pulse generator only references the rising edge of the clock, this technique
ensures more accurate clock reference and lower latency from the clock edge.
Pulse Generator Output Waveform Peak The pulse width must be wide
enough to ensure that the pulse reaches Vdd under all load conditions that the
pulse generator must drive, over all practical corners. This is to make sure that
the pulse width is deterministic. If the pulse width reaches Vdd under all load
conditions, the pulse will always be discharged from the same voltage under the
same PVT conditions and will therefore be deterministic. This eliminates pulse
width variation beyond what is attributed to the PVT conditions. The other reason
for having the clock pulse reach Vdd is to make sure that the flops always see
the same drive level at its clock input, thereby avoiding varying setup and hold
time due to varying gate drive.
Pulse Generator Delay Tracking of Data Path Delay The delay chain formed
by Inv1 through Inv2 is by necessity constructed with transistors of minimum
size, to keep the power down. This is where we have to trade power consumption
STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 375
for process tracking of the data delay. The devices must be large enough so that
the delay is not dominated by parasitics. The parasitics along the delay chain must
be minimized as you would on the data path that is optimized for speed. The delay
chain speedup ratio must match the data path speedup closely over the practical
corners to avoid running into a hold time violation. If the data path speeds up
more than the delay chain, especially for dynamic pulse flops, we could end up in
a situation when the input data to the dynamic flop change before the pulse resets.
The last element in the delay chain (Inv3) must have the same stack height
as the logic flop driven by the pulse generator. If the flop that received the clock
pulse from the pulse generator is not a simple flop but a dynamic logic flop, Inv3
in the delay chain must have the same stack height as the dynamic logic that is
preceding the flop (see Figure 11.29). This allows the delay chain to track the
logic delay over process corners. Figures 11.30 and 11.31 illustrate the need to
relax spacing rules as well as poly end-cap coverage to reduce device variation
due to processing distortion of drawn polygons.
A
B
Inv3
In Out Dynamic logic Flop
Inv3 mimics flop logic
Dynamic logic
Global
Clock
Pulse
Output Inv1 Inv2 Inv3
Inv4
Nand1
Figure 11.29 Delay tracking technique for pulse generators.
As drawn End pullback
Poly flaring
encroaches
diffusion
Diffusion
Figure 11.30 Poly flaring.
376 DESIGN FOR VARIABILITY
Poly
Diffusion
Diffusion corner
flaring
Figure 11.31 Poor end-cap coverage for poly at diffusion corner.
11.3 CORNER MODELING METHODOLOGY FOR NANO-CMOS
PROCESSES
SPICE modeling has become the most critical component for enabling designers
to determine necessary design margins to meet the stringent requirements of
modern IC circuits. With the ever-increasing speed requirements, margins have
continued to decrease, forcing designers to rely more heavily on models for an
accurate reflection of the process, including its expected variation. The traditional
approach for model development has been to use a nominal case adjusted to a
foundry process control methodology and then to develop corner models that are
worst case for digital logic. The process variance has not scaled equivalently
with the critical dimension scaling, which has made this source of error more
pronounced, especially on the deep submicron processes. There is now a real
need for statistical models for a more accurate representation of the process.
Figure 11.32 shows a diagram of the various levels of process variation. Each
level in the process flow can add additional variation to the device performance.
Understanding the contribution at each stage is important for creating accurate
statistical models.
11.3.1 Need for Statistical Models
The process corner model approach creates unrealistic process combinations
and leads to overdesign, especially as design margins become smaller. This is
illustrated in Figure 11.33, a scatter plot of the NMOS and PMOS ID sat mea-
surements over numerous wafer lots. Here, fast–slow and slow–fast (FS and SF)
corners rarely occur. This makes sense from a process standpoint since PMOS
and NMOS devices are only partially correlated. For example, if we consider
the various parameters that can vary, such as oxide thickness, gate length, gate
width, channel doping, and halo implant, some of them (e.g., oxide thickness and
channel length) will vary similarly for PMOS and NMOS devices, while others
CORNER MODELING METHODOLOGY FOR NANO-CMOS PROCESSES 377
Fab1
Fab2
Device to
Device
Die to
Die
Wafer to
Wafer Lot to Lot
Line to
Line
Intradie Interdie
Figure 11.32 Various levels of process variations.
60
70
80
90
100
110
120
160 170 180 190 200 210 220 230
TT
FS
SF
±1s
±2s
±3s
FF
SS
NMOS IDsat (mA/mm)
PM
O
S
ID
sa
t
(m
A
/m
m
)
Figure 11.33 Process variation map for PMOS and NMOS devices.
will not be correlated and will vary independently. Additionally, the variance of
the process will have both localized and global components. Process corners do
not provide this partitioning of the variance, so it is impossible to determine the
effect of localized variation between devices based on corner models.
Identifying the worst-case corner for analog circuits becomes difficult. The
concept of fast/slow may not be applicable. For an operational amplifier, high
gain/low gain may make more sense, but which digital process corner corre-
sponds to the high-gain case for the amplifier may be difficult to say, since it is
378 DESIGN FOR VARIABILITY
dependent on the specifics of the amplifier architecture. Identifying what process
corner represents the worst-case corner becomes more difficult as subblocks are
combined to form more complex systems such as a data converter. The analog
circuit may end up being overdesigned if the analog circuit is simulated using
the digital process corners, especially given the already limited design space for
analog circuits. Overdesign of a circuit can result in increased complexity, larger
die size, and potentially, a missed market window and is therefore best avoided
if possible. If we consider the variation of several parameters that can vary for a
process, the combined variance can be expressed as
σtotal =
√
σ2tox + σ2L + σ2W + σ2Np + σ2Nn + σ2µp + σ2µn + · · · >> 3σ
Combining the variation in this manner can result in significant overdesign of a
circuit if it must meet the performance requirements at these extreme cases.
The use of statistical modeling allows the designer to estimate the functional
yield of a given design before it has been fabricated. This information is crucial
for making trade-offs during the design cycle rather than postfabrication. The
designer can look at subblocks within a design to determine the contribution of
each of these components toward the overall system yield, allowing emphasis to
be placed on the most critical portions of the design. The designer will also be
able to make an assessment of device sizing effects on the functional yield.
11.3.2 Statistical Model Use
Statistical models are based on a first principles approach to measuring the source
of variation and translating that variation into SPICE model parameter variation.
The first step is to identify the independent factors and capture their long-term
variation. An example of this is shown in Figure 11.34, which shows the capac-
itance equivalent thickness (CET) variation in oxide thickness over a period of
time. This information is translated into a histogram, allowing the mean and
standard deviation values to be extracted. These values are then entered into a
0
20
40
60
80
100
120
140
160
180
200
20 19 21
18
19
20
21
22
0 200 400 600 800 1000
Time (arbitrary) Tox (Å)
CE
T
t o
x
(Å
)
Co
un
t
Figure 11.34 Oxide thickness variation over time for a given process.
CORNER MODELING METHODOLOGY FOR NANO-CMOS PROCESSES 379
model such that the independent model parameter is modeled by its nominal
value plus the standard deviation variable. Physical parameters that can be con-
sidered include doping concentrations, oxide thickness, mobility, gate width, and
gate length. It is crucial that the parameters selected be applied correctly to the
SPICE models to ensure that their effects are simulated correctly. For example,
it is common practice simply to vary the threshold voltage of a device to look at
the process variation effects, but this does not capture back gate biasing correctly,
so erroneous results will be obtained. This is partially what makes this task so
difficult since the SPICE models do not have a physical context entirely.
The next step is model correlation. Normally, a parameter such as thresh-
old voltage, VTH0, is set to a fixed value such as VTH0 = 0.4. This would
now become VTH0 = 0.4 + VTH PVAR where VTH PVAR is defined to be
AGAUSS(M, σ, N ), where M represents the mean value, σ the variance, and N
the number of standard deviations represented by σ. Use of this approach would
not capture the threshold voltage dependency on oxide thickness, so it is better
to represent it as [36]
Vth0 = VFB + 2|φF | + qNSxt1 + qNP (Xdep − xt1)
Cox
where Cox = εox/tox, tox = tox + σtox, NS = NS + σNS , NP = NP + σNP , and
VFB = V FB + σVFB . The parameters are as follows: NS is the doping density
between 0 and xt1, and NP is the doping density between xt1 and the depletion
depth Xdep. All other terms have the standard meaning already defined. Using this
representation for the threshold voltage allows a multitude of process parameters
to be accounted for such as the flat-band voltage and channel doping. This also
captures the effects of the substrate biasing as well, making the overall simulation
more accurate. Once the appropriate parameters are obtained, it is possible to run
multiple simulations to obtain a distribution for parameters that can be measured
on wafers such as threshold voltage or IDsat. The real-world measurements can
be compared to the simulated distribution to validate the distribution generated
by the model.
The standard deviation of each of the parameters is typically not the same
for both device types. Similarly, there is a significant dependency on the device
size as well. This size dependency is greater for the channel length, especially
for very small channel lengths. Figure 11.35 shows the localized difference in
threshold voltage between two identical NMOS devices placed side by side to
provide the maximum degree of matching, with varying size for a deep submicron
process. These data do not include device displacement that will add further to
the variation. Localized variation may not be too important for digital logic since
it tends to average out, especially for deep levels of logic, but it becomes crucial
for analog design. This localized variation can be used to determine the optimum
device size for critical components such as a differential pair.
Consideration of both the local (intradie) and global (interdie) variation
represents a reasonable model for the variation. The process variation can be
380 DESIGN FOR VARIABILITY
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4
1/(WL)0.5 (mm−1)
dV
th
(m
V)
Figure 11.35 Threshold variation as a function of device size.
represented by [34]
σ2(P ) = A
2
P
WL
+ S2P D2
where σ(P ) is the standard deviation of the process parameters, P . The
device channel width and length are represented by W and L. The displace-
ment between devices is represented by D, and the parameters AP and SP are
process-dependent constants that must be determined by measurements. The first
term represents the localized variation, and the second term represents the global
variation that is dependent on the physical displacement between devices. In some
cases this model may not provide the necessary insight into the process varia-
tion [35]. For this reason, it may be best to form the variance in more components
to allow great analysis of the various places that variation can be introduced and
the overall impact. One may go to the level of detail shown in Figure 11.32,
where a variance component is assigned for each level. This approach will allow
much more insight into the product yield, but obtaining meaningful information
on the additional variation at each level can become difficult.
This approach is applied to a phase-locked-loop charge pump to estimate the
degree of current mismatch that can be expected. The results of these simulations
are shown in Figure 11.36. Here it is assumed that the design can handle ±6%
mismatch of the current resulting in 15 die that are outside that range, or a 97%
yield. If this yield is deemed adequate, no further design effort is required. If a
higher yield is necessary, the circuit can be redesigned. This redesign may require
entirely new charge pump architecture, or simply resizing critical devices to
decrease the variability. Figure 11.37 shows how the threshold voltage variation
decreases when the device size is increased. The y-axis shows the threshold
voltage shift, while the x-axis shows the normalized device size (area) when
normalized to a minimum-sized device for a 100-nm process. It is possible to
reduce the overall system variation by sizing up critical devices selectively.
NEW FEATURES OF THE BSIM4 MODEL 381
−10
−8
−6
−4
−2
0
2
4
6
8
10
0 100 200 300 400 500
Acceptable
mismatch
range
Yield loss
Simulation run
Cu
rr
en
t m
is
m
at
ch
(%
)
Figure 11.36 Charge pump circuit current mismatch induced by localized and global
effects on threshold voltage variation.
−50
−25
0
25
50
1 10 100 1000
Maximum
threshold
variation
dV
th
(m
V)
Normalized device size
Figure 11.37 Threshold voltage variation as a function of device size.
11.4 NEW FEATURES OF THE BSIM4 MODEL
The implementation of BSIM4 models has allowed a significant improvement
in simulation accuracy for the deep-submicron processes. BSIM4 models incor-
porate several important features previously missing from the BSIM3 models,
which include modeling of the halo or pocket implant, gate-induced drain leak-
age (GIDL), gate direct tunneling, and trench isolation stress effects. Trench
isolation stress effects are discussed at length in Chapter 4.
11.4.1 Halo/Pocket Implant
The halo/pocket implant is used to reduce the threshold voltage roll-off for very
short channel devices, but this implant results in significant DITSs for longer-
channel devices. The halo/pocket implant increases the gds value in long-channel
382 DESIGN FOR VARIABILITY
devices, which is undesirable, especially for analog applications, which is one of
the primary places that longer-channel devices are used. Figure 11.38(a) shows
the location of the halo/pocket implant, and Figure 11.38(b) shows the resulting
DITS effect for a 100-nm process. This output impedance degradation is not
modeled completely in the BSIM3 version because the DITS does not consider
the effect of the halo/pocket implant. Modeling of the halo/pocket implant has
been achieved by no longer assuming a uniform substrate doping. A limitation
still occurs because the DITS output resistance model does not include the body
bias effect.
11.4.2 Gate-Induced Drain Leakage and Gate Direct Tunneling
The various components of off-state leakage are shown in Figure 11.39 along
with a relative indication of the influence for several process generations. The gate
leakage is projected to become a more significant factor at the 90-nm technology
node and beyond, but source–drain leakage remains the primary issue. BSIM4
models allow the gate leakage to be modeled, but at a cost of additional simulation
Halo/Pocket
Implant
(a)
0.0000
0.0100
0.0200
0.0300
0.0400
0.0500
0.0600
0.0700
0.10 1.00 10.00 100.00
(b)
Vt
lin
-V
ts
at
(v
olt
s)
Gate Length (um)
STI STI
Figure 11.38 (a) Halo/pocket implant used on deep submicron processes. (b) Resulting
simulation of DITS for a 100-nm process.
NEW FEATURES OF THE BSIM4 MODEL 383
ISDleak
IGate
IGIDL IJunction
130 nm 90 nm 65 nm
gate leakage
junction leakage
GIDL
S-D leakage
STISTI
Figure 11.39 Transistor off-state leakage components and the relative scaling
with process.
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
0.1 1 10 100
Increasing
channel
length
N
or
m
al
iz
ed
G
at
e
Le
ak
ag
e
Device Width (mm)
Figure 11.40 Normalized total gate leakage as a function of device length and width.
time since the gate leakage must be evaluated at each gate bias point since
it is dependent of the potential across the gate. Figure 11.40 shows the total
normalized gate leak current as a function of device width for various gate lengths
ranging from 0.2 to 15 µm. Figure 11.41 shows the GIDL effect for a thin oxide
device on a 100-nm process. The GIDL current is in the nanoampere range. A
weak dependency on the bulk bias can also be observed.
11.4.3 Modeling Challenges
Although BSIM4 represents a significant improvement over BSIM3 models, it
still does not account for all factors that can have a pronounced affect on device
performance. Many of these effects relate to how the device is laid out and the
physical location of adjacent devices: (1) dogbone devices to realize narrow-
width devices, (2) well proximity effects, and (3) shallow trench isolation stress
effects (these effects can be modeled postlayout). A suggested approach to use
is to avoid layouts that aggravate these effects wherever possible since they
384 DESIGN FOR VARIABILITY
1.E−10
1.E−09
1.E−08
1.E−07
1.E−06
1.E−05
1.E−04
1.E−03
−1.5 −1 −0.5 0 0.5 1 1.5
Vbs = 0V
Vbs = −0.5
Vbs = 0V
Vbs = −0.5
0.E+00
2.E−09
4.E−09
6.E−09
8.E−09
1.E−08
1.E−08
−1.3 −1.1 −0.9 −0.7 −0.5 −0.3 −0.1 0.1
(a)
(b)
I ds
(A
)
I ds
(A
)
Vgate (V)
Vgate (V)
Figure 11.41 Simulation of the gate-induced drain leakage over (a) a wide gate voltage
range and (b) a zoomed area to show the bulk bias influence.
are difficult to model. This approach can lead to a serious constraint with the
physical implementations, increasing the overall die size. A second approach is
to develop macro models that allow these effects to be modeled. These models
can be generated for the most critical circuits within a design, such as an SRAM
cell to ensure that the highest level of accuracy is obtained. These macro models
should be parameterized to allow maximum flexibility. Correlation between the
model and early test chip results is required to ensure that the models are accurate.
11.4.4 Model-Specific Issues
BSIM4 models use nonphysical parameters to have high accuracy for
short/narrow devices. The use of nonphysical parameters makes the model
parameter extraction procedure much more complicated because of the
correlation between short- and long-channel parameters. Insufficiently modeled
physical effects such as doping dependent mobility models for the halo/pocket
REFERENCES 385
implant technologies are resulting in some discrepancy between the modeled
device and the physical device. The reverse short-channel effect (RSCE) needs
to be modeled as well to further improve model accuracy. With each progression
of BSIM model comes an increase in the number of parameters, giving rise to an
increase in the simulation time and memory requirements. It is crucial to balance
the number of parameters with the need to have reasonable simulation times.
11.4.5 Model Summary
Modeling of halo/pocket implanted devices has been improved significantly with
BSIM4. The much needed gate direct tunneling model required for design on
90 nm and below is also available. The parameter extraction approach has become
much more complicated, and the number of parameters has increased signif-
icantly. Macro models can be used to allow modeling of some of the layout
specific issues, but they must be correlated with actual silicon measurements to
confirm their accuracy. There are still quite a few more effects that must be incor-
porated into the model, but this must be done such that it does not significantly
affect the complexity or simulation run time.
11.5 SUMMARY
The principles presented in this chapter can be applied to many other circuit
and layout types to minimize the impact of variation on their functionality as
well as manufacturability. As we scale the technology well into the nano-CMOS
regime, dealing with variation will be part and parcel of all design methodol-
ogy, including ASIC design. Some designs are more sensitive to variation and
would require more care during the design stage to anticipate possible pitfalls so
that we can design around or take special precautions so that variation will not
adversely affect the circuit functionality and manufacturability. Designers must
learn to create variation-insensitive circuits if they are to have high-yielding prod-
uct that meets the design target as well. The concept of conventional variation
has evolved from digital corner methodology to the incorporation of statistical
variation of fundamental physical parameters at both the intra- and interdie level.
In Chapter 10 we dwelt more on the design for manufacturability aspects of the
design and in most cases will be helpful in reducing the impact due to variability.
REFERENCES
[1] International Technology Roadmap for Semiconductors,
[2] K. Bernstein, Design, process, and environmental contributors to CMOS delay vari-
ation, tutorial, IEEE International Solid-State Circuits Conference, Feb. 2003.
[3] S. Borkar et al., Parameter variations and impact on circuits and microarchitecture,
IEEE Design Automation Conference, pp. 338–342, 2003.
386 DESIGN FOR VARIABILITY
[4] Berkeley Predictive Technology Models, ∼ptm.
[5] Y. Cao et al., New paradigm of predictive MOSFET and interconnect modeling for
early circuit design, Proceedings of the IEEE Custom Integrated Circuits Conference,
pp. 201–204, June 2000.
[6] Y. Cao et al., Design sensitivities to variability: extrapolations and assessments in
nanometer VLSI, IEEE International ASIC/SoC Conference, pp. 411–415, Sept.
2002.
[7] S. R. Nassif, Design for variability in DSM technologies, IEEE International Sym-
posium on Quality Electronic Design, pp. 451–454, 2000.
[8] C. Visweswariah, Death, taxes and failing chips, IEEE Design Automation Confer-
ence, pp. 343–347, 2003.
[9] K. A. Bowman, S. G. Duvall, and J. D. Meindl, Impact of die-to-die and within-die
parameter fluctuations on the maximum clock frequency distribution, IEEE Interna-
tional Solid-State Circuits Conference, pp. 278–279, 2001.
[10] M. Eisele, J. Berthold, D. Schmitt-Landsiedel, and R. Mahnkopf, The impact of
intra-die device parameter variations on path delays and on the design for yield
of low voltage digital circuits, IEEE Trans. VLSI Syst., Vol. 5, No. 4, pp. 360–368,
Dec. 1997.
[11] D. Burnett, K. Erington, C. Subramanian, and K. Baker, Implications of fundamen-
tal threshold voltage variations for high-density SRAM and logic circuits, IEEE
Symposium on VLSI Technology, pp. 15–16, 1994.
[12] Y. Cao et al., Yield optimization with energy-delay constraints in low-power digital
circuits, IEEE Conference on Electron Devices and Solid-State Circuits, Hong Kong,
Dec. 2003.
[13] S. Mukhopadhyay and K. Roy, Modeling and estimation of total leakage current
in nano-scaled CMOS devices considering the effect of parameter variation, IEEE
International Symposium on Low Power Electronics and Design, pp. 172–175, 2003.
[14] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, Modeling and analysis of leak-
age power considering within-die process variations, IEEE International Symposium
on Low Power Electronics and Design, pp. 64–67, 2002.
[15] H. Q. Dao, K. Nowka, and V. G. Oklobdzija, Analysis of clocked timing elements
for dynamic voltage scaling effects over process parameter variation, IEEE Interna-
tional Symposium on Low Power Electronics and Design, pp. 56–59, 2001.
[16] S. Lin and C. K. Wong, Process-variation-tolerant clock skew minimization, Inter-
national Conference on Computer-Aided Design, 1994.
[17] B. Gieseke et al., A 600 MHz superscalar RISC microprocessor with out-of-order
execution, IEEE International Solid-State Circuits Conference, pp. 176–177, Feb.
1997.
[18] H. Ando et al., A 1.3 GHz fifth generation SPARC64 microprocessor, IEEE Inter-
national Solid-State Circuits Conference, Feb. 2003.
[19] M. Bohr, Interconnect scaling: the real limiter to high performance ULSI, Proceed-
ings of the IEEE International Electron Devices Meeting, pp. 241–244, Dec. 1995.
[20] K. Bernstein et al., High Speed CMOS Design Styles, Kluwer Academic, Norwell,
MA, pp. 41–45, 1998.
[21] A. Kahng and M. Sarrafzadeh, Modern physical design: part V, tutorial, Interna-
tional Conference on Computer-Aided Design, Nov. 1999.
REFERENCES 387
[22] D. Bailey and B. Benschneider, Clocking design and analysis for a 600-MHz alpha
microprocessor, IEEE J. Solid-State Circuits, Vol. 33, No. 11, Nov. 1998.
[23] C. Bittlestone, A. Hill, V. Singhal, and N. V. Arvind, Architecting ASIC libraries
and flows in nanometer era, Design Automation Conference, June 2003.
[24] K. Osada et al., Universal-Vdd 0.65–2.0 V 32 kB cache using voltage-adapted
timing-generation scheme and a lithographical-symmetric cell, IEEE International
Solid-State Circuits Conference, pp. 168–169, Feb. 2001.
[25] K. Bernstein, Design, process, and environmental contributors to CMOS delay vari-
ation, SCCS near Limit Scaling Workshop, 2003.
[26] A. Asenov et al., Increase in the random dopant induced threshold fluctuations and
lowering in sub-100 nm MOSFETs due to quantum effects: a 3-D density-gradient
simulation study, IEEE Trans. Electron Devices, Vol. 48, No. 4, Apr. 2001.
[27] P. Larsson, Measurements and analysis of PLL jitter caused by digital switching
noise, IEEE J. Solid-State Circuits, Vol. 36, No. 7, July 2001.
[28] K. Osada et al., Universal-Vdd 0.65–2.0-V 32-kB cache using a voltage-adapted
timing-generation scheme and a lithographically symmetrical cell, IEEE J. Solid-
State Circuits, Vol. 36, No. 11, Nov. 2001.
[29] M. Yamaoka, K. Osada, and K. Ishibashi, 0.4-V logic library friendly SRAM array
using rectangular-diffusion cell and delta-boosted-array-voltage scheme, IEEE Sym-
posium on VLSI Circuits, 2002.
[30] D. Harris and M. A. Horowitz, Skew-tolerant domino circuits, IEEE J. Solid-State
Circuits, Vol. 32, No. 11, Nov. 1997.
[31] G. A. Ruiz, Evaluation of three 32-bit CMOS adders in DCVS logic for self-timed
circuits, IEEE J. Solid-State Circuits, Vol. 33, No. 4, Apr. 1998.
[32] L. G. Heller and W. R. Griffin, Cascode voltage switch logic: a differential CMOS
logic family, IEEE International Solid-State Circuits Conference, pp. 16–17, 1984.
[33] K. Okada, Statistical modeling of device characteristics with systematic variability,
IEICE Trans. Fundam., Vol. E84-A, No. 2, Feb. 2001.
[34] M. J. M. Pelgrom, C. J. Duinmaijer, and A. P. G. Welbers, Matching properties of
MOS transistors, IEEE J. Solid State Circuits, Vol. 24, No. 5, pp. 1433–1440, Oct.
1989.
[35] C. Michael and M. Ismail, Statistical modeling of device mismatch for analog MOS
integrated circuits, IEEE J. Solid State Circuits, Vol. 27, No. 2, pp. 154–166, Feb.
1992.
[36] W. Zhang and Z. Yang, A new threshold voltage model for deep-submicron
MOSFETs with nonuniform substrate dopings, Microelectron. Reliab., Vol. 38, pp.
1465–1469, 1998.
INDEX
8B/10B encoding, 226
Aberrations, 79, 80, 81, 82, 86, 87
ACLV, 94, 98
Alexander phase detector, 227
Astigmatism, 80, 81
Asynchronous design, 323
Back end of line, 58–66
chemical mechanical planarization (CMP),
6, 10, 63, 79, 109, 359
copper resistivity, 62
FSG, 10
interconnect, dishing, 7
interconnect, erosion, 7
low-κ dielectric, 8, 10
pattern density, 350
wire density, 350
Back-side connection, 160
Bandgap reference, 146, 154
Bit-cell, 352
1T1C, 241, 244
3T1C, 241
8f 2, 242–243, 247
design(s), 352, 352–360
layout, 354–360
misalignment, 355–358
Body bias
adaptive, 311
VBB, 247–248
Bragg’s condition, 74
BSIM3 models, 135
BSIM4
halo implant, 381
models, 138, 381
model specific issues, 384
pocket implant, 381
Bulk silicon, 161
Capacitor, 142, 143, 144
decoupling, 162, 163,164, 165, 166,
228–231, 348, 368
metal, 367
metal comb, 144
metal-insulator-metal (MIM), 144
storage, 242, 245
scaling, 245
stacked, 245–246
Ta2O5, 246
trench, 245–246
Carrier mobility, 139, 140
Ceqv, 369
Circuit delay variability, 344
Clock data recovery (CDR), 159
Nano-CMOS Circuit and Physical Design, by Ban P. Wong, Anurag Mittal, Yu Cao, and Greg Starr
ISBN 0-471-46610-7 Copyright 2005 John Wiley & Sons, Inc.
389
390 INDEX
Clock distribution strategies, 347
H-tree, 348
layout- clock buffer, 349
shielding, 349
Clock skew, 11
COG, 106
Common mode, 224, 225, 226
feedback, 224
level, 224
voltage, 225, 226
Copper wire, 61
low-κ dielectrics, 64
Critical dimension (CD), 6, 17, 79, 83–100,
109–110, 118–119, 137, 147, 332–333,
340–341
Current mirror, 146, 150–151, 225
Data converter, 147–148, 159, 180
analog-to-digital converter (ADC), 180,
227
sigma-delta converter, 147
Data retention voltage, 319
Deep n-well, 161
Delay chain, 374–375
Delay locked loop (DLL), 167
Delay variation
pulse flop, 373
trip point, 373
Depth of focus (DOF), 83, 104, 113
Design for manufacturability (DFM), 331,
342
analog, 339
Design rule check (DRC), 136
Differential pair, 152
Differential signaling, 292
Diffusion, dogbone-shaped, 351
Diffusion, flaring, 336, 341, 351
Dynamic voltage scaling, 311
Electrostatic discharge (ESD), 157–158,
172–173, 176–177, 180–186, 188–189,
195, 200, 211–212, 220, 227
breakdown, 172, 195, 200
charged device model (CDM), 173, 176,
212
human body model, 173, 176, 180,
185–186, 195, 198, 211
implantation, 177
low-C, 180, 181–186, 188, 189
machine model (MM), 173, 176,
185
pin-to-pin, 173
power-rail, 173
silicide block, 177, 180
Epitaxial, 161
Equalization, 237–238
Equivalent oxide thickness (EOT), 134
FinFET, 6, 25, 320
Focus, 79, 81, 82, 83
Folded-bit-line architecture, 243
FOM, 3
Forbidden zones (pitches), 109, 340
Front end of line 25, 41
carrier mobility, 42
CET, 14
dopant fluctuation, 15
drain-induced threshold voltage shift (DITS),
18–19, 141, 367, 382
gate-induced drain leakage (GIDL), 1,
17–20, 135, 248, 382
overlap capacitance, 353
parasitics capacitance, 52
poly depletion, 18
proximity effects, 17, 18, 341
rapid thermal processing, 34
RSD, 3
short channel effects, 41
DIBL, 13, 367
RSC, 18, 367
velocity saturation, 344
STI, 13, 340
stress, 13, 17–18, 341
strain engineering (Strained Si), 6, 14, 33
Vth, 15
Gate dielectric
alternative dielectrics, 29
equivalent thickness, 27, 41
quantum effects, 43
scaling, 26, 29
Gate-driven design, 176, 177
Gate leakage current, 135, 141. See also
Tunneling
direct tunneling leakage, 49
gate direct tunneling, 18, 382
Gate-grounded NMOS, 178–180, 185, 191
Guard ring, 159, 160
I/O standards
advanced graphics port (AGP), 221
current mode logic (CML), 221, 225–226,
238
emitter-coupled logic (ECL), 221
gunning transceiver logic (GTL), 221
high-speed transceiver logic (HSTL), 221
hypertransport, 221
INDEX 391
low-voltage differential signal (LVDS), 221,
223
low-voltage positive referenced
emitter-coupled logic (LVPECL), 221
low-voltage CMOS (LVCMOS), 221
low-voltage transistor-transistor logic
(LVTTL), 221
positive referenced emitter-coupled logic
(PECL), 221
stub series terminated logic (SSTL), 221,
223
Illumination, 75, 78–79, 82, 87, 93–94,
108
annular, 75, 93, 102, 104, 108, 112
conventional, 75, 93–94
dipole, 75, 93–94, 108
quadrupole, 75, 93, 108
Image fidelity, 82
Imaging performance, 75–76
Imaging theory, 73
Impedance matching, 234
Inductor, 144–145
Input stage, 152
Interconnect
capacitance, 265
circuit representation, 260
driver sizing, 272, 285
frequency dependent RL, 269
inductance, 261, 267
power consumption, 304
resistance, 264
κ-Factor, 74, 76–78, 85, 87, 90
Layout
bad practices, 363
common centroid, 364
good practices, 365
Manhattan, 93, 108
poly jumper, 365
process interaction, 354, 364
suboptimal, 332
Leakage suppression schemes, 323
Lens, 79–80, 82, 86, 121, 123
LER, 15–16
Level shift, 148
Low-noise amplifier, 185
Low-power DRAM design, 308, 319
Low-power SRAM design, 305, 316
Low-κ imaging, 76, 78, 82–88, 91, 94,
107–108, 110–111, 118–119
Mask error enhancement factor (MEEF),
84–86, 119
Masks, 103. See also Resolution enhancement
techniques
alternating (PSM), 103–104, 106–107,
114–115, 119
phase conflict, 116
hard phase-shift masks, 103
Monte Carlo, 86
Moore’s law, 21, 77
MOSFET gate
direct tunneling leakage, 49
leakage suppression schemes, 323
metal electrode, 48
polysilicon depletion, 45
Multilevel pulse amplitude modulation,
226–227
Multiple supply and threshold voltages, 302,
314
Nitride capping, 6
Numerical aperture, 5, 73–74, 77, 84–85, 87,
90–91, 121–123
Outer diameter (OD), 140, 156
Output stage, 153–154
class AB, 153
Parametric variation, 343
Parasitics, 155
interconnect, 155
layout extracted netlist, 156
resistor capacitor extraction (RCE), 155
Phase locked loop (PLL), 10, 143, 148–149,
159, 168, 340, 366
Phase noise, 146
Photolithography, 73
direct write electron beam, 126
EUV, 5, 124, 125, 126
immersion lithography, 5, 122–123
particle beam, 126
Pitch, 83
Poly flaring, 351
Poly orientation, 20
Polysilicon depletion, 16, 45
Power busing, 166
Power consumption, 346
Power integrity, 20
Preempahsis, 235, 236, 237
Process sensitivities, 82
Process variation, 78–79, 82, 377
CD, 348
die-to-die, 344
random, 345
systematic, 345
within-die, 345
392 INDEX
Proximity effects 17–18
poly, 18, 367, 369
STI, 18
transistor, 358
well, 18, 341, 367, 369
PSRR, 367
Pulse generator, 374
Radio frequency (RF), 157, 159
RC/RLC timing, 274, 278
Reflectivity, 78, 79
Reliability
MOSFET reliability
hot carrier, HCI, 3, 57
negative bias temperature instability
(NBTI), 15, 57, 135, 142, 332
time-dependent dielectric breakdown, 56
TDDB, 8, 249
Repeater insertion, 288
Resist, 78
Resistor, 142
Resolution enhancement techniques, 1, 5, 73,
91, 107, 111, 113, 117–119, 121, 331
optical proximity correction (OPC), 12, 16,
18, 73, 89, 91, 94–95, 97–98, 109–110,
111, 113, 120, 331, 338, 340–341, 359
rules-based (RBOPC), 98, 99
hammer head, 96, 111, 359
model-based (MOPC), 98–101, 103, 111,
354, 360–361
overcorrection, 360
undercorrection, 360
phase shift, 12, 81, 91, 338
asymmetric, 81
Levenson phase shift, 103
symmetric, 81
subresolution assist features (SRAF), 73, 91,
101–102, 110, 112, 120, 340–341, 360
Scaling, 59
array transistor, 247
capacitor (DRAM storage), 245
sense amplifier, 249
Self-timed delay margin, 372
Sense amplifier, 243–244, 249, 251, 253
Shallow trench isolation (STI), 135, 137–140,
156–157
Shot noise, 141
Signal integrity analysis, 256
capacitive coupling noise, 276
inductive coupling noise, 280
line-to-line coupling, 11
noise-aware timing, 281
noise-constrained routing, 284
Silicon controlled rectifier (SCR), 175, 178,
192–212, 227
double-triggered SCR (DTSCR), 207–208
dynamic-holding voltage SCR (DHVSCR),
211–212
grounded-gate triggered SCR (GGSCR),
203–205, 210
high-current NMOS-triggered SCR
(HINTSCR), 210
high-holding-current SCR (HHI-SCR),
210
low-voltage triggering SCR (LVTSCR), 194,
202–203, 209, 210
native-NMOS triggered (NANSCR),
209–210, 212
NMOS-triggered low-voltage SCR
(PTLSCR), 203
n-type substrate-triggered SCR (N STSCR),
204, 206–207
PMOS-triggered low-voltage SCR
(PTLSCR), 202
PMOS-triggered SCR (PTSCR), 202,
p-type substrate-triggered SCR (P STSCR),
204, 206–207
stacked NMOS-triggered SCR (SNTSCR),
192–199, 202
substrate-triggered SCR (STSCR), 211
SOI, 6
SPICE modeling, 19, 376
challenges, 383
corner methodology, 376
statistical methodology, 19, 376, 378
Stack effect, 300
Stacked diodes, 175
Stacked I/O, 223
Substrate triggered design, 176
Subwavelength gap, 4–5, 77, 331
Supply noise, 146
immunity, 146
Termination, 220, 232, 233, 234
Threshold voltage, 146, 150
low threshold, 147
Topography, 79
Trim mask, 105, 106
Tunneling, 141
edge direct tunneling (EDT), 141
Fowler–Nordhelm tunneling, 141
gate-to-channel tunneling, 141. See also
Gate leakage current
Variation
contact resistance, 366
design-related, 361
INDEX 393
device-related, 362
diffusion, dogbone-shaped,
366
electrical stress-related, 362
interdie, 379
intradie, 379
process-related, 362
self-timed delay, 370
Vertical access transistor, 250
Voltage controlled oscillator (VCO), 138,
146–148, 155–156
Vsignal, 245
Wavelength, 45, 73–74, 77, 83, 86, 121,
123–124
Wire spread routes, 339
Zernike polynomials, 80
Các file đính kèm theo tài liệu này:
- Nano CMOS Circuit and Physical Design.pdf