Nano CMOS Circuit and Physical Design

Nội dung chínhChapter 1 NANO-CMOS SCALING PROBLEMS AND IMPLICATIONS Chapter 2 CMOS DEVICE AND PROCESS TECHNOLOGY Chapter 3 THEORY AND PRACTICALITIES OF SUBWAVELENGTH OPTICAL LITHOGRAPHY Chapter 4 MIXED-SIGNAL CIRCUIT DESIGN Chapter 5 ELECTROSTATIC DISCHARGE PROTECTION DESIGN Chapter 6 INPUT/OUTPUT DESIGN Chapter 7 DRAM Chapter 8 SIGNAL INTEGRITY PROBLEMS IN ON-CHIP INTERCONNECTS Chapter 9 ULTRALOW POWER CIRCUIT DESIGN Chapter 10 DESIGN FOR MANUFACTURABILITY Chapter 11 DESIGN FOR VARIABILITY

409 trang | Chia sẻ: banmai | Lượt xem: 2543 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Nano CMOS Circuit and Physical Design, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

4.5m 4m 3.5m 2.5m 1.5m 500u 3m 2m 1m 0 0 200m 400m 600m 800m 1 1.2 Figure 11.21 Temperature effect on Ids of sub-100-nm transistors. 0.0000 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 0.10 1.00 10.00 100.00 V t h lin − V t h sa t (V ) Gate length (µm) Figure 11.22 Drain-induced threshold voltage shift. Package and system modeling of the supply impedance is now very important, especially for high-performance chip designs. L(di/dt) is the most significant voltage drop in such designs, and the supply impedance must be designed to keep the L(di/dt) plus IR drop to within the design budget, usually 10% of the supply voltage for high-performance processors. This would require on-chip decoupling capacitors, package capacitors, and capacitors on the system board. At a minimum, the decoupling capacitors on the chip should be 10 times the STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 369 Scattered ions Ion implant Photo resist STI n-well p-well (a) (b) Ion implants Poly gates Figure 11.23 (a) Well proximity effects. (b) Poly proximity effects. equivalent switching capacitance (Ceqv) of the chip, where Ceqv = chip power Vdd 2 × frequency The chip power is determined using the worse-case vector, which means that the Ceqv value will be larger and can better maintain the power to the chip during the peak power demand period. Hence, performance degradation during the period of peak power demand due to supply voltage droop will be kept to a minimum due to the well-decoupled supply. Package and system board modeling is a very important part of the design in order to meet the supply impedance goal of a high-performance system and is beyond the scope of this book. In mixed-signal designs in an epitaxial process, sharing Vss but separating the Vdd supply will result in the lowest noise coupling from the digital domain [12]. This also provides an opportunity for using a higher voltage power supply than in digital circuits, to provide the necessary headroom for some designs. Sharing 370 DESIGN FOR VARIABILITY Vss simplifies the electrostatic discharge (ESD) protection as well (see Chapter 5 for a full discussion on ESD protection in the nano-CMOS regime). 11.2.4 Digital Circuit Strategies to Deal with Variations Digital circuits are usually more tolerant to process variation; however, some digital circuits, including self-timed circuits and matched delay circuits, can be extremely sensitive to process variation. Self-timing is used primarily in embed- ded memories such as cache memories. It was used most commonly during the period when clock frequency was low. To reduce the access time of memories, self-timing techniques were used to generate edges to clock the sense amplifiers (SAs), so that memory data were available earlier in the clock cycle. This enabled one-cycle access, including logical operation on the memory data, for better per- formance. As clock frequency scales, the access time of the embedded SRAM has come within the clock cycle time, so a lot more edges have become avail- able to clock the SAs. Therefore, there is now less compulsion for self-timing to generate edges. The only other need for self-timing is to save power in cases where the SAs are not clocked until an address changes, while the clocked design requires clock gating to reduce clock power. It is still a lot easier and more robust to gate the SA clock than to self-time it. Many schemes are designed to mitigate the impact of variation on design robustness if one must self-time. We discuss next a self-timed scheme used in SRAM. Self-Timing Strategies Traditional self-timed memory relied on a single SRAM cell to drive the dummy bitline, which is then converted to a full CMOS level and fanout to drive the SA clock line [28]. As can be seen in Figure 11.24, a single-cell self-timing scheme is very sensitive to process variation that causes cell drive variation resulting in higher self-timing delay variation. To avoid failures due to the higher self-timing path delay variation, more margin is needed—at the expense of performance. 0 2 4 6 8 10 12 14 16 18 0 5 10 15 20 25 D el ay V a ria tio n Number of Cells Figure 11.24 Effect of number of cells on self-timed delay variation. STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 371 As shown in Figure 11.24, lower self-timed path delay variation can be achieved if more than one cell is used [13]. The use of several cells averages out cell current variations. A multicell self-timed scheme is illustrated in Figure 11.25. To minimize cell drive variation it is important to have another column of dummy cells at the edge of the array next to the self-timed dummy column. In the memory array, metal density is very consistent, making it easier to control ILD variation. Due to the regularity of the metal lines, resistance variation due to CMP chemical–mechanical planarization (CMP) is kept to a minimum, provided that the fabricator optimizes CMP for the metal density as found in the memory array. Most fabricators understand the need to reduce resistance variation in a memory array and will optimize the process around the memory array. Even so, there will still be some resistance variation due to barrier metal thickness and wire width variation. Self-Timed Margins Figure 11.26 illustrates a typical race condition in self- timed designs. In this illustration, delay in Out1 must be less than the delay in Out2; otherwise, functional failure would result. Due to process, voltage, and Dummy WL Bit Bar Dummy Cell Dummy Cell Dummy Cell Array Cell Array Cell Bit Dummy BL Regular WL Figure 11.25 Multicell self-timed scheme. Out1 Out2 Delay1 Delay2 Common signal point Figure 11.26 Margining self-timed paths. 372 DESIGN FOR VARIABILITY temperature (PVT) variations and layout differences, Delay2 may become shorter than Delay1 on silicon, due the fact that some local effects are not fully or correctly modeled or anticipated during the design phase. When this happens in a self-timed design, it will result in a functional failure which will not work at any frequency, including at very low frequency, and will require design redo to restore even basic functionality. This is a very serious and costly design failure. To safeguard against such a situation, we add margins to the simulation model to cover for the unanticipated effects, so as to reduce the probability of such a functional failure. As mentioned earlier, the speed of Delay2 may not match that of Delay1 due either to some unanticipated effect or if the circuit is not fully optimized. The following analysis translates the margin into a physically meaningful parameter that can be used to verify the margin of the self-timed circuit. The self-timed circuit in Figure 11.26 at the verge of failure can be represented as Delay2 × (1 − M) = Delay1 × (1 + M) where M is the self-timed margin Simplifying, we obtain M × (Delay1 + Delay2) = Delay2 − Delay1 Hence, M = Delay2 − Delay1 Delay1 + Delay2 Typically, M is set to 0.25 for prelayout and 0.15 for postlayout extracted simula- tions over all practical corners. The use of statistical models is highly encouraged for more realistic corner coverage. Further details on statistical modeling are given in Section 11.3. Regardless of the self-timing margin, every self-timed path must have metal programmable options to increase the margin to at least 30% in all practical corners. As mentioned earlier, self-timed race failure is catastrophic for a chip; the addition of metal programming options can lead to a quick loop fix. The metal options must be designed to affect a self-timing margin change in as little as one layer and no more than two layers. This is important, since mask cost is on the rise, especially for nano-CMOS process nodes. If possible, design the programming change at as high a metal level as possible to allow for a quick fabrication turnaround time for the fix in the event that a self-timing margin change is necessary. Delay Variation Due to SlowNodes Slow nodes manifest themselves as high- fanout nodes, long unrepeated lines, and signals through pass gates and cascading pass gates. Pass gates present themselves as large resistors to the signal, just like long unrepeated lines. When more then two pass gates (unbuffered) are in a signal’s path, the result is a really slow node that must be dealt with. Slow nodes could also be weakly driven nodes, as in the case of signals through cascading pass gates and long, unrepeated signal lines. The weakly driven nodes are more STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 373 Trip point variation Large delay variation Small delay variation Figure 11.27 Trip point versus delay variation. susceptible to noise coupling into the far-end node where the receiver resides. There is another hazard that affects all slow nodes, including high-fanout nodes. As shown in Figure 11.27, variation in the input trip point of the receiver will translate into a larger input delay variation due to the gentle slope of the input signal on the slow nodes. Maintaining an input slew rate enables the design to better tolerate P-to-N process skew that affects gate input threshold or trip point. In some circuits, such as an arithmetic block, there will be pass gates in the data path if pass gate adders are used. In some cases there could be several pass gates in series in the data path unless the designers add buffers between the cascading full adders. This adds delay in the critical path. There are ways to mitigate this by using differential cascode voltage switch (DCVS) logic instead [31][32]. Pulse Flop Clock Generator Design Strategies Match Trip Points Pulse flop operation and design are not covered in this book; refer to other circuit design texts for a detailed discussion. Full understanding of the pulse flop operation is needed to appreciate the following discussion on pro- cess variation issues that affect the pulse generator and operation of the pulse flop. Figure 11.28 shows a typical pulse generator for pulse flops. Inv1 through Inv3 Global Clock Pulse Output Global Clock Inv1 Inv2 Inv3 Inv4 Nand1 Figure 11.28 Typical pulse flop pulse generator. 374 DESIGN FOR VARIABILITY form a delay chain that defines the pulse width of a pulse generator. Pulse gener- ator pulse width variation has a serious impact on the hold time of a pulse flop. The input trip point of Nand1 and Inv1 must be matched; otherwise, the pulse width varies with the global clock edge rate variation. A longer pulse output width will result in a longer hold-time requirement but offers a longer transpar- ent time. If the logic cone feeding into the pulse flop is not properly balanced in timing, the longer transparent period due to the wider pulse width can cause a hold-time problem even when there is a maximum time path from the same logic cone. Let us consider the case where Nand1 has a higher trip point than the input trip point of the inverter chain, starting with Inv1 in Figure 11.28. As the global clock rises, Inv1 trips first and starts the delay chain going, while Nand1 has not quite reacted to the global clock input. This in effect shortens the output pulse width of the pulse generator because the inverter delay chain times out sooner with respect to the rising edge of the pulse generator output. The delay after Inv1 until Nand1 triggers will be the amount of shortening of the pulse generator pulse width. As can be seen in Figure 11.27, the clock rise time change can alter this delay, thus changing the pulse width. The clock rise time can change for several reasons, and the change can affect the hold time of the chip and cause catastrophic failure. The flip condition where the Nand1 trip point is lower than Inv1 will increase the pulse width and hold time requirement of the pulse flops. In cell-based designs where the pulse flops characterization condition assumes that the trip point of Inv1 and Nand1 are matched, hold time failures can result if the trip points of Inv1 and Nand1 are not matched, as that changes the actual hold time requirement of the flops. Set the input trip point slightly below Vdd /2 (lower middle third) but not too low; otherwise, ground bounce will be an issue. The reason for this is that the edge placement error is lower at a point low on the clock rising edge. Since the pulse generator only references the rising edge of the clock, this technique ensures more accurate clock reference and lower latency from the clock edge. Pulse Generator Output Waveform Peak The pulse width must be wide enough to ensure that the pulse reaches Vdd under all load conditions that the pulse generator must drive, over all practical corners. This is to make sure that the pulse width is deterministic. If the pulse width reaches Vdd under all load conditions, the pulse will always be discharged from the same voltage under the same PVT conditions and will therefore be deterministic. This eliminates pulse width variation beyond what is attributed to the PVT conditions. The other reason for having the clock pulse reach Vdd is to make sure that the flops always see the same drive level at its clock input, thereby avoiding varying setup and hold time due to varying gate drive. Pulse Generator Delay Tracking of Data Path Delay The delay chain formed by Inv1 through Inv2 is by necessity constructed with transistors of minimum size, to keep the power down. This is where we have to trade power consumption STRATEGIES TO MITIGATE IMPACT DUE TO VARIATIONS 375 for process tracking of the data delay. The devices must be large enough so that the delay is not dominated by parasitics. The parasitics along the delay chain must be minimized as you would on the data path that is optimized for speed. The delay chain speedup ratio must match the data path speedup closely over the practical corners to avoid running into a hold time violation. If the data path speeds up more than the delay chain, especially for dynamic pulse flops, we could end up in a situation when the input data to the dynamic flop change before the pulse resets. The last element in the delay chain (Inv3) must have the same stack height as the logic flop driven by the pulse generator. If the flop that received the clock pulse from the pulse generator is not a simple flop but a dynamic logic flop, Inv3 in the delay chain must have the same stack height as the dynamic logic that is preceding the flop (see Figure 11.29). This allows the delay chain to track the logic delay over process corners. Figures 11.30 and 11.31 illustrate the need to relax spacing rules as well as poly end-cap coverage to reduce device variation due to processing distortion of drawn polygons. A B Inv3 In Out Dynamic logic Flop Inv3 mimics flop logic Dynamic logic Global Clock Pulse Output Inv1 Inv2 Inv3 Inv4 Nand1 Figure 11.29 Delay tracking technique for pulse generators. As drawn End pullback Poly flaring encroaches diffusion Diffusion Figure 11.30 Poly flaring. 376 DESIGN FOR VARIABILITY Poly Diffusion Diffusion corner flaring Figure 11.31 Poor end-cap coverage for poly at diffusion corner. 11.3 CORNER MODELING METHODOLOGY FOR NANO-CMOS PROCESSES SPICE modeling has become the most critical component for enabling designers to determine necessary design margins to meet the stringent requirements of modern IC circuits. With the ever-increasing speed requirements, margins have continued to decrease, forcing designers to rely more heavily on models for an accurate reflection of the process, including its expected variation. The traditional approach for model development has been to use a nominal case adjusted to a foundry process control methodology and then to develop corner models that are worst case for digital logic. The process variance has not scaled equivalently with the critical dimension scaling, which has made this source of error more pronounced, especially on the deep submicron processes. There is now a real need for statistical models for a more accurate representation of the process. Figure 11.32 shows a diagram of the various levels of process variation. Each level in the process flow can add additional variation to the device performance. Understanding the contribution at each stage is important for creating accurate statistical models. 11.3.1 Need for Statistical Models The process corner model approach creates unrealistic process combinations and leads to overdesign, especially as design margins become smaller. This is illustrated in Figure 11.33, a scatter plot of the NMOS and PMOS ID sat mea- surements over numerous wafer lots. Here, fast–slow and slow–fast (FS and SF) corners rarely occur. This makes sense from a process standpoint since PMOS and NMOS devices are only partially correlated. For example, if we consider the various parameters that can vary, such as oxide thickness, gate length, gate width, channel doping, and halo implant, some of them (e.g., oxide thickness and channel length) will vary similarly for PMOS and NMOS devices, while others CORNER MODELING METHODOLOGY FOR NANO-CMOS PROCESSES 377 Fab1 Fab2 Device to Device Die to Die Wafer to Wafer Lot to Lot Line to Line Intradie Interdie Figure 11.32 Various levels of process variations. 60 70 80 90 100 110 120 160 170 180 190 200 210 220 230 TT FS SF ±1s ±2s ±3s FF SS NMOS IDsat (mA/mm) PM O S ID sa t (m A /m m ) Figure 11.33 Process variation map for PMOS and NMOS devices. will not be correlated and will vary independently. Additionally, the variance of the process will have both localized and global components. Process corners do not provide this partitioning of the variance, so it is impossible to determine the effect of localized variation between devices based on corner models. Identifying the worst-case corner for analog circuits becomes difficult. The concept of fast/slow may not be applicable. For an operational amplifier, high gain/low gain may make more sense, but which digital process corner corre- sponds to the high-gain case for the amplifier may be difficult to say, since it is 378 DESIGN FOR VARIABILITY dependent on the specifics of the amplifier architecture. Identifying what process corner represents the worst-case corner becomes more difficult as subblocks are combined to form more complex systems such as a data converter. The analog circuit may end up being overdesigned if the analog circuit is simulated using the digital process corners, especially given the already limited design space for analog circuits. Overdesign of a circuit can result in increased complexity, larger die size, and potentially, a missed market window and is therefore best avoided if possible. If we consider the variation of several parameters that can vary for a process, the combined variance can be expressed as σtotal = √ σ2tox + σ2L + σ2W + σ2Np + σ2Nn + σ2µp + σ2µn + · · · >> 3σ Combining the variation in this manner can result in significant overdesign of a circuit if it must meet the performance requirements at these extreme cases. The use of statistical modeling allows the designer to estimate the functional yield of a given design before it has been fabricated. This information is crucial for making trade-offs during the design cycle rather than postfabrication. The designer can look at subblocks within a design to determine the contribution of each of these components toward the overall system yield, allowing emphasis to be placed on the most critical portions of the design. The designer will also be able to make an assessment of device sizing effects on the functional yield. 11.3.2 Statistical Model Use Statistical models are based on a first principles approach to measuring the source of variation and translating that variation into SPICE model parameter variation. The first step is to identify the independent factors and capture their long-term variation. An example of this is shown in Figure 11.34, which shows the capac- itance equivalent thickness (CET) variation in oxide thickness over a period of time. This information is translated into a histogram, allowing the mean and standard deviation values to be extracted. These values are then entered into a 0 20 40 60 80 100 120 140 160 180 200 20 19 21 18 19 20 21 22 0 200 400 600 800 1000 Time (arbitrary) Tox (Å) CE T t o x (Å ) Co un t Figure 11.34 Oxide thickness variation over time for a given process. CORNER MODELING METHODOLOGY FOR NANO-CMOS PROCESSES 379 model such that the independent model parameter is modeled by its nominal value plus the standard deviation variable. Physical parameters that can be con- sidered include doping concentrations, oxide thickness, mobility, gate width, and gate length. It is crucial that the parameters selected be applied correctly to the SPICE models to ensure that their effects are simulated correctly. For example, it is common practice simply to vary the threshold voltage of a device to look at the process variation effects, but this does not capture back gate biasing correctly, so erroneous results will be obtained. This is partially what makes this task so difficult since the SPICE models do not have a physical context entirely. The next step is model correlation. Normally, a parameter such as thresh- old voltage, VTH0, is set to a fixed value such as VTH0 = 0.4. This would now become VTH0 = 0.4 + VTH PVAR where VTH PVAR is defined to be AGAUSS(M, σ, N ), where M represents the mean value, σ the variance, and N the number of standard deviations represented by σ. Use of this approach would not capture the threshold voltage dependency on oxide thickness, so it is better to represent it as [36] Vth0 = VFB + 2|φF | + qNSxt1 + qNP (Xdep − xt1) Cox where Cox = εox/tox, tox = tox + σtox, NS = NS + σNS , NP = NP + σNP , and VFB = V FB + σVFB . The parameters are as follows: NS is the doping density between 0 and xt1, and NP is the doping density between xt1 and the depletion depth Xdep. All other terms have the standard meaning already defined. Using this representation for the threshold voltage allows a multitude of process parameters to be accounted for such as the flat-band voltage and channel doping. This also captures the effects of the substrate biasing as well, making the overall simulation more accurate. Once the appropriate parameters are obtained, it is possible to run multiple simulations to obtain a distribution for parameters that can be measured on wafers such as threshold voltage or IDsat. The real-world measurements can be compared to the simulated distribution to validate the distribution generated by the model. The standard deviation of each of the parameters is typically not the same for both device types. Similarly, there is a significant dependency on the device size as well. This size dependency is greater for the channel length, especially for very small channel lengths. Figure 11.35 shows the localized difference in threshold voltage between two identical NMOS devices placed side by side to provide the maximum degree of matching, with varying size for a deep submicron process. These data do not include device displacement that will add further to the variation. Localized variation may not be too important for digital logic since it tends to average out, especially for deep levels of logic, but it becomes crucial for analog design. This localized variation can be used to determine the optimum device size for critical components such as a differential pair. Consideration of both the local (intradie) and global (interdie) variation represents a reasonable model for the variation. The process variation can be 380 DESIGN FOR VARIABILITY 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 1/(WL)0.5 (mm−1) dV th (m V) Figure 11.35 Threshold variation as a function of device size. represented by [34] σ2(P ) = A 2 P WL + S2P D2 where σ(P ) is the standard deviation of the process parameters, P . The device channel width and length are represented by W and L. The displace- ment between devices is represented by D, and the parameters AP and SP are process-dependent constants that must be determined by measurements. The first term represents the localized variation, and the second term represents the global variation that is dependent on the physical displacement between devices. In some cases this model may not provide the necessary insight into the process varia- tion [35]. For this reason, it may be best to form the variance in more components to allow great analysis of the various places that variation can be introduced and the overall impact. One may go to the level of detail shown in Figure 11.32, where a variance component is assigned for each level. This approach will allow much more insight into the product yield, but obtaining meaningful information on the additional variation at each level can become difficult. This approach is applied to a phase-locked-loop charge pump to estimate the degree of current mismatch that can be expected. The results of these simulations are shown in Figure 11.36. Here it is assumed that the design can handle ±6% mismatch of the current resulting in 15 die that are outside that range, or a 97% yield. If this yield is deemed adequate, no further design effort is required. If a higher yield is necessary, the circuit can be redesigned. This redesign may require entirely new charge pump architecture, or simply resizing critical devices to decrease the variability. Figure 11.37 shows how the threshold voltage variation decreases when the device size is increased. The y-axis shows the threshold voltage shift, while the x-axis shows the normalized device size (area) when normalized to a minimum-sized device for a 100-nm process. It is possible to reduce the overall system variation by sizing up critical devices selectively. NEW FEATURES OF THE BSIM4 MODEL 381 −10 −8 −6 −4 −2 0 2 4 6 8 10 0 100 200 300 400 500 Acceptable mismatch range Yield loss Simulation run Cu rr en t m is m at ch (% ) Figure 11.36 Charge pump circuit current mismatch induced by localized and global effects on threshold voltage variation. −50 −25 0 25 50 1 10 100 1000 Maximum threshold variation dV th (m V) Normalized device size Figure 11.37 Threshold voltage variation as a function of device size. 11.4 NEW FEATURES OF THE BSIM4 MODEL The implementation of BSIM4 models has allowed a significant improvement in simulation accuracy for the deep-submicron processes. BSIM4 models incor- porate several important features previously missing from the BSIM3 models, which include modeling of the halo or pocket implant, gate-induced drain leak- age (GIDL), gate direct tunneling, and trench isolation stress effects. Trench isolation stress effects are discussed at length in Chapter 4. 11.4.1 Halo/Pocket Implant The halo/pocket implant is used to reduce the threshold voltage roll-off for very short channel devices, but this implant results in significant DITSs for longer- channel devices. The halo/pocket implant increases the gds value in long-channel 382 DESIGN FOR VARIABILITY devices, which is undesirable, especially for analog applications, which is one of the primary places that longer-channel devices are used. Figure 11.38(a) shows the location of the halo/pocket implant, and Figure 11.38(b) shows the resulting DITS effect for a 100-nm process. This output impedance degradation is not modeled completely in the BSIM3 version because the DITS does not consider the effect of the halo/pocket implant. Modeling of the halo/pocket implant has been achieved by no longer assuming a uniform substrate doping. A limitation still occurs because the DITS output resistance model does not include the body bias effect. 11.4.2 Gate-Induced Drain Leakage and Gate Direct Tunneling The various components of off-state leakage are shown in Figure 11.39 along with a relative indication of the influence for several process generations. The gate leakage is projected to become a more significant factor at the 90-nm technology node and beyond, but source–drain leakage remains the primary issue. BSIM4 models allow the gate leakage to be modeled, but at a cost of additional simulation Halo/Pocket Implant (a) 0.0000 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 0.10 1.00 10.00 100.00 (b) Vt lin -V ts at (v olt s) Gate Length (um) STI STI Figure 11.38 (a) Halo/pocket implant used on deep submicron processes. (b) Resulting simulation of DITS for a 100-nm process. NEW FEATURES OF THE BSIM4 MODEL 383 ISDleak IGate IGIDL IJunction 130 nm 90 nm 65 nm gate leakage junction leakage GIDL S-D leakage STISTI Figure 11.39 Transistor off-state leakage components and the relative scaling with process. 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 0.1 1 10 100 Increasing channel length N or m al iz ed G at e Le ak ag e Device Width (mm) Figure 11.40 Normalized total gate leakage as a function of device length and width. time since the gate leakage must be evaluated at each gate bias point since it is dependent of the potential across the gate. Figure 11.40 shows the total normalized gate leak current as a function of device width for various gate lengths ranging from 0.2 to 15 µm. Figure 11.41 shows the GIDL effect for a thin oxide device on a 100-nm process. The GIDL current is in the nanoampere range. A weak dependency on the bulk bias can also be observed. 11.4.3 Modeling Challenges Although BSIM4 represents a significant improvement over BSIM3 models, it still does not account for all factors that can have a pronounced affect on device performance. Many of these effects relate to how the device is laid out and the physical location of adjacent devices: (1) dogbone devices to realize narrow- width devices, (2) well proximity effects, and (3) shallow trench isolation stress effects (these effects can be modeled postlayout). A suggested approach to use is to avoid layouts that aggravate these effects wherever possible since they 384 DESIGN FOR VARIABILITY 1.E−10 1.E−09 1.E−08 1.E−07 1.E−06 1.E−05 1.E−04 1.E−03 −1.5 −1 −0.5 0 0.5 1 1.5 Vbs = 0V Vbs = −0.5 Vbs = 0V Vbs = −0.5 0.E+00 2.E−09 4.E−09 6.E−09 8.E−09 1.E−08 1.E−08 −1.3 −1.1 −0.9 −0.7 −0.5 −0.3 −0.1 0.1 (a) (b) I ds (A ) I ds (A ) Vgate (V) Vgate (V) Figure 11.41 Simulation of the gate-induced drain leakage over (a) a wide gate voltage range and (b) a zoomed area to show the bulk bias influence. are difficult to model. This approach can lead to a serious constraint with the physical implementations, increasing the overall die size. A second approach is to develop macro models that allow these effects to be modeled. These models can be generated for the most critical circuits within a design, such as an SRAM cell to ensure that the highest level of accuracy is obtained. These macro models should be parameterized to allow maximum flexibility. Correlation between the model and early test chip results is required to ensure that the models are accurate. 11.4.4 Model-Specific Issues BSIM4 models use nonphysical parameters to have high accuracy for short/narrow devices. The use of nonphysical parameters makes the model parameter extraction procedure much more complicated because of the correlation between short- and long-channel parameters. Insufficiently modeled physical effects such as doping dependent mobility models for the halo/pocket REFERENCES 385 implant technologies are resulting in some discrepancy between the modeled device and the physical device. The reverse short-channel effect (RSCE) needs to be modeled as well to further improve model accuracy. With each progression of BSIM model comes an increase in the number of parameters, giving rise to an increase in the simulation time and memory requirements. It is crucial to balance the number of parameters with the need to have reasonable simulation times. 11.4.5 Model Summary Modeling of halo/pocket implanted devices has been improved significantly with BSIM4. The much needed gate direct tunneling model required for design on 90 nm and below is also available. The parameter extraction approach has become much more complicated, and the number of parameters has increased signif- icantly. Macro models can be used to allow modeling of some of the layout specific issues, but they must be correlated with actual silicon measurements to confirm their accuracy. There are still quite a few more effects that must be incor- porated into the model, but this must be done such that it does not significantly affect the complexity or simulation run time. 11.5 SUMMARY The principles presented in this chapter can be applied to many other circuit and layout types to minimize the impact of variation on their functionality as well as manufacturability. As we scale the technology well into the nano-CMOS regime, dealing with variation will be part and parcel of all design methodol- ogy, including ASIC design. Some designs are more sensitive to variation and would require more care during the design stage to anticipate possible pitfalls so that we can design around or take special precautions so that variation will not adversely affect the circuit functionality and manufacturability. Designers must learn to create variation-insensitive circuits if they are to have high-yielding prod- uct that meets the design target as well. The concept of conventional variation has evolved from digital corner methodology to the incorporation of statistical variation of fundamental physical parameters at both the intra- and interdie level. In Chapter 10 we dwelt more on the design for manufacturability aspects of the design and in most cases will be helpful in reducing the impact due to variability. REFERENCES [1] International Technology Roadmap for Semiconductors, [2] K. Bernstein, Design, process, and environmental contributors to CMOS delay vari- ation, tutorial, IEEE International Solid-State Circuits Conference, Feb. 2003. [3] S. Borkar et al., Parameter variations and impact on circuits and microarchitecture, IEEE Design Automation Conference, pp. 338–342, 2003. 386 DESIGN FOR VARIABILITY [4] Berkeley Predictive Technology Models, ∼ptm. [5] Y. Cao et al., New paradigm of predictive MOSFET and interconnect modeling for early circuit design, Proceedings of the IEEE Custom Integrated Circuits Conference, pp. 201–204, June 2000. [6] Y. Cao et al., Design sensitivities to variability: extrapolations and assessments in nanometer VLSI, IEEE International ASIC/SoC Conference, pp. 411–415, Sept. 2002. [7] S. R. Nassif, Design for variability in DSM technologies, IEEE International Sym- posium on Quality Electronic Design, pp. 451–454, 2000. [8] C. Visweswariah, Death, taxes and failing chips, IEEE Design Automation Confer- ence, pp. 343–347, 2003. [9] K. A. Bowman, S. G. Duvall, and J. D. Meindl, Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution, IEEE Interna- tional Solid-State Circuits Conference, pp. 278–279, 2001. [10] M. Eisele, J. Berthold, D. Schmitt-Landsiedel, and R. Mahnkopf, The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits, IEEE Trans. VLSI Syst., Vol. 5, No. 4, pp. 360–368, Dec. 1997. [11] D. Burnett, K. Erington, C. Subramanian, and K. Baker, Implications of fundamen- tal threshold voltage variations for high-density SRAM and logic circuits, IEEE Symposium on VLSI Technology, pp. 15–16, 1994. [12] Y. Cao et al., Yield optimization with energy-delay constraints in low-power digital circuits, IEEE Conference on Electron Devices and Solid-State Circuits, Hong Kong, Dec. 2003. [13] S. Mukhopadhyay and K. Roy, Modeling and estimation of total leakage current in nano-scaled CMOS devices considering the effect of parameter variation, IEEE International Symposium on Low Power Electronics and Design, pp. 172–175, 2003. [14] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, Modeling and analysis of leak- age power considering within-die process variations, IEEE International Symposium on Low Power Electronics and Design, pp. 64–67, 2002. [15] H. Q. Dao, K. Nowka, and V. G. Oklobdzija, Analysis of clocked timing elements for dynamic voltage scaling effects over process parameter variation, IEEE Interna- tional Symposium on Low Power Electronics and Design, pp. 56–59, 2001. [16] S. Lin and C. K. Wong, Process-variation-tolerant clock skew minimization, Inter- national Conference on Computer-Aided Design, 1994. [17] B. Gieseke et al., A 600 MHz superscalar RISC microprocessor with out-of-order execution, IEEE International Solid-State Circuits Conference, pp. 176–177, Feb. 1997. [18] H. Ando et al., A 1.3 GHz fifth generation SPARC64 microprocessor, IEEE Inter- national Solid-State Circuits Conference, Feb. 2003. [19] M. Bohr, Interconnect scaling: the real limiter to high performance ULSI, Proceed- ings of the IEEE International Electron Devices Meeting, pp. 241–244, Dec. 1995. [20] K. Bernstein et al., High Speed CMOS Design Styles, Kluwer Academic, Norwell, MA, pp. 41–45, 1998. [21] A. Kahng and M. Sarrafzadeh, Modern physical design: part V, tutorial, Interna- tional Conference on Computer-Aided Design, Nov. 1999. REFERENCES 387 [22] D. Bailey and B. Benschneider, Clocking design and analysis for a 600-MHz alpha microprocessor, IEEE J. Solid-State Circuits, Vol. 33, No. 11, Nov. 1998. [23] C. Bittlestone, A. Hill, V. Singhal, and N. V. Arvind, Architecting ASIC libraries and flows in nanometer era, Design Automation Conference, June 2003. [24] K. Osada et al., Universal-Vdd 0.65–2.0 V 32 kB cache using voltage-adapted timing-generation scheme and a lithographical-symmetric cell, IEEE International Solid-State Circuits Conference, pp. 168–169, Feb. 2001. [25] K. Bernstein, Design, process, and environmental contributors to CMOS delay vari- ation, SCCS near Limit Scaling Workshop, 2003. [26] A. Asenov et al., Increase in the random dopant induced threshold fluctuations and lowering in sub-100 nm MOSFETs due to quantum effects: a 3-D density-gradient simulation study, IEEE Trans. Electron Devices, Vol. 48, No. 4, Apr. 2001. [27] P. Larsson, Measurements and analysis of PLL jitter caused by digital switching noise, IEEE J. Solid-State Circuits, Vol. 36, No. 7, July 2001. [28] K. Osada et al., Universal-Vdd 0.65–2.0-V 32-kB cache using a voltage-adapted timing-generation scheme and a lithographically symmetrical cell, IEEE J. Solid- State Circuits, Vol. 36, No. 11, Nov. 2001. [29] M. Yamaoka, K. Osada, and K. Ishibashi, 0.4-V logic library friendly SRAM array using rectangular-diffusion cell and delta-boosted-array-voltage scheme, IEEE Sym- posium on VLSI Circuits, 2002. [30] D. Harris and M. A. Horowitz, Skew-tolerant domino circuits, IEEE J. Solid-State Circuits, Vol. 32, No. 11, Nov. 1997. [31] G. A. Ruiz, Evaluation of three 32-bit CMOS adders in DCVS logic for self-timed circuits, IEEE J. Solid-State Circuits, Vol. 33, No. 4, Apr. 1998. [32] L. G. Heller and W. R. Griffin, Cascode voltage switch logic: a differential CMOS logic family, IEEE International Solid-State Circuits Conference, pp. 16–17, 1984. [33] K. Okada, Statistical modeling of device characteristics with systematic variability, IEICE Trans. Fundam., Vol. E84-A, No. 2, Feb. 2001. [34] M. J. M. Pelgrom, C. J. Duinmaijer, and A. P. G. Welbers, Matching properties of MOS transistors, IEEE J. Solid State Circuits, Vol. 24, No. 5, pp. 1433–1440, Oct. 1989. [35] C. Michael and M. Ismail, Statistical modeling of device mismatch for analog MOS integrated circuits, IEEE J. Solid State Circuits, Vol. 27, No. 2, pp. 154–166, Feb. 1992. [36] W. Zhang and Z. Yang, A new threshold voltage model for deep-submicron MOSFETs with nonuniform substrate dopings, Microelectron. Reliab., Vol. 38, pp. 1465–1469, 1998. INDEX 8B/10B encoding, 226 Aberrations, 79, 80, 81, 82, 86, 87 ACLV, 94, 98 Alexander phase detector, 227 Astigmatism, 80, 81 Asynchronous design, 323 Back end of line, 58–66 chemical mechanical planarization (CMP), 6, 10, 63, 79, 109, 359 copper resistivity, 62 FSG, 10 interconnect, dishing, 7 interconnect, erosion, 7 low-κ dielectric, 8, 10 pattern density, 350 wire density, 350 Back-side connection, 160 Bandgap reference, 146, 154 Bit-cell, 352 1T1C, 241, 244 3T1C, 241 8f 2, 242–243, 247 design(s), 352, 352–360 layout, 354–360 misalignment, 355–358 Body bias adaptive, 311 VBB, 247–248 Bragg’s condition, 74 BSIM3 models, 135 BSIM4 halo implant, 381 models, 138, 381 model specific issues, 384 pocket implant, 381 Bulk silicon, 161 Capacitor, 142, 143, 144 decoupling, 162, 163,164, 165, 166, 228–231, 348, 368 metal, 367 metal comb, 144 metal-insulator-metal (MIM), 144 storage, 242, 245 scaling, 245 stacked, 245–246 Ta2O5, 246 trench, 245–246 Carrier mobility, 139, 140 Ceqv, 369 Circuit delay variability, 344 Clock data recovery (CDR), 159 Nano-CMOS Circuit and Physical Design, by Ban P. Wong, Anurag Mittal, Yu Cao, and Greg Starr ISBN 0-471-46610-7 Copyright  2005 John Wiley & Sons, Inc. 389 390 INDEX Clock distribution strategies, 347 H-tree, 348 layout- clock buffer, 349 shielding, 349 Clock skew, 11 COG, 106 Common mode, 224, 225, 226 feedback, 224 level, 224 voltage, 225, 226 Copper wire, 61 low-κ dielectrics, 64 Critical dimension (CD), 6, 17, 79, 83–100, 109–110, 118–119, 137, 147, 332–333, 340–341 Current mirror, 146, 150–151, 225 Data converter, 147–148, 159, 180 analog-to-digital converter (ADC), 180, 227 sigma-delta converter, 147 Data retention voltage, 319 Deep n-well, 161 Delay chain, 374–375 Delay locked loop (DLL), 167 Delay variation pulse flop, 373 trip point, 373 Depth of focus (DOF), 83, 104, 113 Design for manufacturability (DFM), 331, 342 analog, 339 Design rule check (DRC), 136 Differential pair, 152 Differential signaling, 292 Diffusion, dogbone-shaped, 351 Diffusion, flaring, 336, 341, 351 Dynamic voltage scaling, 311 Electrostatic discharge (ESD), 157–158, 172–173, 176–177, 180–186, 188–189, 195, 200, 211–212, 220, 227 breakdown, 172, 195, 200 charged device model (CDM), 173, 176, 212 human body model, 173, 176, 180, 185–186, 195, 198, 211 implantation, 177 low-C, 180, 181–186, 188, 189 machine model (MM), 173, 176, 185 pin-to-pin, 173 power-rail, 173 silicide block, 177, 180 Epitaxial, 161 Equalization, 237–238 Equivalent oxide thickness (EOT), 134 FinFET, 6, 25, 320 Focus, 79, 81, 82, 83 Folded-bit-line architecture, 243 FOM, 3 Forbidden zones (pitches), 109, 340 Front end of line 25, 41 carrier mobility, 42 CET, 14 dopant fluctuation, 15 drain-induced threshold voltage shift (DITS), 18–19, 141, 367, 382 gate-induced drain leakage (GIDL), 1, 17–20, 135, 248, 382 overlap capacitance, 353 parasitics capacitance, 52 poly depletion, 18 proximity effects, 17, 18, 341 rapid thermal processing, 34 RSD, 3 short channel effects, 41 DIBL, 13, 367 RSC, 18, 367 velocity saturation, 344 STI, 13, 340 stress, 13, 17–18, 341 strain engineering (Strained Si), 6, 14, 33 Vth, 15 Gate dielectric alternative dielectrics, 29 equivalent thickness, 27, 41 quantum effects, 43 scaling, 26, 29 Gate-driven design, 176, 177 Gate leakage current, 135, 141. See also Tunneling direct tunneling leakage, 49 gate direct tunneling, 18, 382 Gate-grounded NMOS, 178–180, 185, 191 Guard ring, 159, 160 I/O standards advanced graphics port (AGP), 221 current mode logic (CML), 221, 225–226, 238 emitter-coupled logic (ECL), 221 gunning transceiver logic (GTL), 221 high-speed transceiver logic (HSTL), 221 hypertransport, 221 INDEX 391 low-voltage differential signal (LVDS), 221, 223 low-voltage positive referenced emitter-coupled logic (LVPECL), 221 low-voltage CMOS (LVCMOS), 221 low-voltage transistor-transistor logic (LVTTL), 221 positive referenced emitter-coupled logic (PECL), 221 stub series terminated logic (SSTL), 221, 223 Illumination, 75, 78–79, 82, 87, 93–94, 108 annular, 75, 93, 102, 104, 108, 112 conventional, 75, 93–94 dipole, 75, 93–94, 108 quadrupole, 75, 93, 108 Image fidelity, 82 Imaging performance, 75–76 Imaging theory, 73 Impedance matching, 234 Inductor, 144–145 Input stage, 152 Interconnect capacitance, 265 circuit representation, 260 driver sizing, 272, 285 frequency dependent RL, 269 inductance, 261, 267 power consumption, 304 resistance, 264 κ-Factor, 74, 76–78, 85, 87, 90 Layout bad practices, 363 common centroid, 364 good practices, 365 Manhattan, 93, 108 poly jumper, 365 process interaction, 354, 364 suboptimal, 332 Leakage suppression schemes, 323 Lens, 79–80, 82, 86, 121, 123 LER, 15–16 Level shift, 148 Low-noise amplifier, 185 Low-power DRAM design, 308, 319 Low-power SRAM design, 305, 316 Low-κ imaging, 76, 78, 82–88, 91, 94, 107–108, 110–111, 118–119 Mask error enhancement factor (MEEF), 84–86, 119 Masks, 103. See also Resolution enhancement techniques alternating (PSM), 103–104, 106–107, 114–115, 119 phase conflict, 116 hard phase-shift masks, 103 Monte Carlo, 86 Moore’s law, 21, 77 MOSFET gate direct tunneling leakage, 49 leakage suppression schemes, 323 metal electrode, 48 polysilicon depletion, 45 Multilevel pulse amplitude modulation, 226–227 Multiple supply and threshold voltages, 302, 314 Nitride capping, 6 Numerical aperture, 5, 73–74, 77, 84–85, 87, 90–91, 121–123 Outer diameter (OD), 140, 156 Output stage, 153–154 class AB, 153 Parametric variation, 343 Parasitics, 155 interconnect, 155 layout extracted netlist, 156 resistor capacitor extraction (RCE), 155 Phase locked loop (PLL), 10, 143, 148–149, 159, 168, 340, 366 Phase noise, 146 Photolithography, 73 direct write electron beam, 126 EUV, 5, 124, 125, 126 immersion lithography, 5, 122–123 particle beam, 126 Pitch, 83 Poly flaring, 351 Poly orientation, 20 Polysilicon depletion, 16, 45 Power busing, 166 Power consumption, 346 Power integrity, 20 Preempahsis, 235, 236, 237 Process sensitivities, 82 Process variation, 78–79, 82, 377 CD, 348 die-to-die, 344 random, 345 systematic, 345 within-die, 345 392 INDEX Proximity effects 17–18 poly, 18, 367, 369 STI, 18 transistor, 358 well, 18, 341, 367, 369 PSRR, 367 Pulse generator, 374 Radio frequency (RF), 157, 159 RC/RLC timing, 274, 278 Reflectivity, 78, 79 Reliability MOSFET reliability hot carrier, HCI, 3, 57 negative bias temperature instability (NBTI), 15, 57, 135, 142, 332 time-dependent dielectric breakdown, 56 TDDB, 8, 249 Repeater insertion, 288 Resist, 78 Resistor, 142 Resolution enhancement techniques, 1, 5, 73, 91, 107, 111, 113, 117–119, 121, 331 optical proximity correction (OPC), 12, 16, 18, 73, 89, 91, 94–95, 97–98, 109–110, 111, 113, 120, 331, 338, 340–341, 359 rules-based (RBOPC), 98, 99 hammer head, 96, 111, 359 model-based (MOPC), 98–101, 103, 111, 354, 360–361 overcorrection, 360 undercorrection, 360 phase shift, 12, 81, 91, 338 asymmetric, 81 Levenson phase shift, 103 symmetric, 81 subresolution assist features (SRAF), 73, 91, 101–102, 110, 112, 120, 340–341, 360 Scaling, 59 array transistor, 247 capacitor (DRAM storage), 245 sense amplifier, 249 Self-timed delay margin, 372 Sense amplifier, 243–244, 249, 251, 253 Shallow trench isolation (STI), 135, 137–140, 156–157 Shot noise, 141 Signal integrity analysis, 256 capacitive coupling noise, 276 inductive coupling noise, 280 line-to-line coupling, 11 noise-aware timing, 281 noise-constrained routing, 284 Silicon controlled rectifier (SCR), 175, 178, 192–212, 227 double-triggered SCR (DTSCR), 207–208 dynamic-holding voltage SCR (DHVSCR), 211–212 grounded-gate triggered SCR (GGSCR), 203–205, 210 high-current NMOS-triggered SCR (HINTSCR), 210 high-holding-current SCR (HHI-SCR), 210 low-voltage triggering SCR (LVTSCR), 194, 202–203, 209, 210 native-NMOS triggered (NANSCR), 209–210, 212 NMOS-triggered low-voltage SCR (PTLSCR), 203 n-type substrate-triggered SCR (N STSCR), 204, 206–207 PMOS-triggered low-voltage SCR (PTLSCR), 202 PMOS-triggered SCR (PTSCR), 202, p-type substrate-triggered SCR (P STSCR), 204, 206–207 stacked NMOS-triggered SCR (SNTSCR), 192–199, 202 substrate-triggered SCR (STSCR), 211 SOI, 6 SPICE modeling, 19, 376 challenges, 383 corner methodology, 376 statistical methodology, 19, 376, 378 Stack effect, 300 Stacked diodes, 175 Stacked I/O, 223 Substrate triggered design, 176 Subwavelength gap, 4–5, 77, 331 Supply noise, 146 immunity, 146 Termination, 220, 232, 233, 234 Threshold voltage, 146, 150 low threshold, 147 Topography, 79 Trim mask, 105, 106 Tunneling, 141 edge direct tunneling (EDT), 141 Fowler–Nordhelm tunneling, 141 gate-to-channel tunneling, 141. See also Gate leakage current Variation contact resistance, 366 design-related, 361 INDEX 393 device-related, 362 diffusion, dogbone-shaped, 366 electrical stress-related, 362 interdie, 379 intradie, 379 process-related, 362 self-timed delay, 370 Vertical access transistor, 250 Voltage controlled oscillator (VCO), 138, 146–148, 155–156 Vsignal, 245 Wavelength, 45, 73–74, 77, 83, 86, 121, 123–124 Wire spread routes, 339 Zernike polynomials, 80

Các file đính kèm theo tài liệu này:

Nano CMOS Circuit and Physical Design.pdf