Kỹ thuật viễn thông - Chapter 10: Pipelined and parallel recursive and adaptive filters

Scattered look-ahead pipelining: The denominator of the transfer function in (10.9) is transformed in a way that it contains the N terms . Equivalently, the state y(n) is computed in terms of N past scattered states y(n-M),y(n-2M), , and y(n-NM) • In scattered look-ahead, for each poles in the original filter, we introduce (M-1) canceling poles and zeros with equal angular spacing at a distance from the origin the same as that of the original pole. – Example: if the original filter has a pole at , we add (M-1) poles and zeros at to derive a pipelined realization with M loop pipeline stages

47 trang | Chia sẻ: huyhoang44 | Lượt xem: 908 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Kỹ thuật viễn thông - Chapter 10: Pipelined and parallel recursive and adaptive filters, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Chapter 10: Pipelined and Parallel Recursive and Adaptive Filters Keshab K. Parhi Chapter 10 2 Outline • Introduction • Pipelining in 1st-Order IIR Digital Filters • Pipelining in Higher-Order IIR Digital Filters • Parallel Processing for IIR Filters • Combined Pipelining and Parallel Processing for IIR Filters Chapter 10 3 First-Order IIR Filter • Consider a 1st-order linear time-invariant recursion (see Fig. 1) • The iteration period of this filter is , where represent word-level multiplication time and addition time Look-Ahead Computation { }am TT + { }am TT , )()()1( nubnyany ×+×=+ (10.1) • In look-ahead transformation, the linear recursion is first iterated a few times to create additional concurrency. • By recasting this recursion, we can express y(n+2) as a function of y(n) to obtain the following expression (see Fig. 2(a)) • The iteration bound of this recursion is , the same as the original version, because the amount of computation and the number of logical delays inside the recursive loop have both doubled ( ) 22 am TT + [ ] )1()()()2( +++=+ nbunbunayany (10.2) Chapter 10 4 • Another recursion equivalent to (10.2) is (10.3). Shown on Fig.2(b), its iteration bound is , a factor of 2 lower than before. • Applying (M-1) steps of look-ahead to the iteration of (10.1), we can obtain an equivalent implementation described by (see Fig. 3) – Note: the loop delay is instead of , which means that the loop computation must be completed in M clock cycles (not 1 clock cycle). The iteration bound of this computation is , which corresponds to a sample rate M times higher than that of the original filter – The terms in (10.4) can be pre-computed (referred to as pre-computation terms). The second term in RHS of (10.4) is the look-ahead computation term (referred to as the look-ahead complexity); it is non-recursive and can be easily pipelined )1()()()2( 2 +×+×+×=+ nubnuabnyany ( ) 2am TT + (10.3) å - = --+××+×=+ 1 0 )1()()( M i iM iMnubanyaMny (10.4) Mz- 1-z ( ) MTT am + { }MM ababaab ,,,, 12 -××× Chapter 10 5 Fig.2.(a) Fig.2.(b) 2D D b a u(n+1) y(n+1) u(n) b a y(n+2) y(n) 2D D ab a u(n+1) u(n) y(n+2) y(n) b 2 D b a u(n) y(n) y(n+1) Fig. 1 Chapter 10 6 MD D ab a u(n+M-1) u(n+M-2) y(n+M) y(n) b M D a b u(n) M-1 Fig. 3: M-stage Pipelinable 1st-Order IIR Filter • Look-ahead computation has allowed a single serial computation to be transformed into M independent concurrent computations, and to pipeline the feedback loop to achieve high speed filtering of a single time series while maintaining full hardware utilization. • Provided the multiplier and the adder can be conveniently pipelined, the iteration bound can be achieved by retiming or cutset transformation (see Chapter 4) Chapter 10 7 Pipelining in 1st-Order IIR Digital Filters – Look-ahead techniques add canceling poles and zeros with equal angular spacing at a distance from the origin which is same as that of the original pole. The pipelined filters are always stable provided that the original filter is stable – The pipelined realizations require a linear increase in complexity but decomposition techniques can be used to obtain an implementation with logarithmic increase in hardware with respect to the number of loop pipeline stages – Example: Consider the 1st-order IIR filter transfer function • The output sample y(n) can be computed using the input sample u(n) and the past output sample as follows: • The sample rate of this recursive filter is limited by the computation time of one multiply-add operation 11 1 )( -×- = za zH (10.5) )()1()( nunyany +-×= (10.6) Chapter 10 8 Pipelining in 1st-Order IIR Digital Filters (continued) 1. Look-Ahead Pipelining for 1st-Order IIR Filters • Look-ahead pipelining adds canceling poles and zeroes to the transfer function such that the coefficients of in the denominator of the transfer function are zero. Then, the output sample y(n) can be computed using the inputs and the output sample y(n-M) such that there are M delay elements in the critical loop, which in turn can be used to pipeline the critical loop by M stages and the sample rate can be increased by a factor M • Example: Consider the 1st-order filter, , which has a pole at z=a (a£1). A 3-stage pipelined equivalent stable filter can be derived by adding poles and zeroes at , and is given by ( )},,{ 11 --- ××× Mzz ( )111)( -×-= zazH ( )32pjaez ±= 33 221 1 1 )( - -- ×- ×+×+ = za zaza zH Chapter 10 9 Pipelining in 1st-Order IIR Digital Filters (continued) 2. Look-Ahead Pipelining with Power-of-2 Decomposition • With power-of-2 decomposition, an M-stage (for power-of-2 M) pipelined implementation for 1st-order IIR filter can be obtained by sets of transformations • Example: Consider a 1st-order recursive filter transfer function described by . The equivalent pipelined transfer function can be described using the decomposition technique as follows – This pipelined implementation is derived by adding (M-1) poles and zeros at identical locations. – The original transfer function has a single pole at (see Fig.4(a)). M2log ( ) MM M i za zazb zH ii - - = -- ×- ×+× = Õ 1 1 )( 1log 0 221 2 (10.7) ( ) ( )11 1)( -- ×-×= zazbzH az = Chapter 10 10 – The pipelined transfer function has poles at the following locations (see Fig.4(b) for M=8): – The decomposition of the canceling zeros is shown in Fig.4(c). The i-th stage of the decomposed non-recursive portion implements zeros located at: – The i-th stage of the decomposed non-recursive portion requires a single pipelined multiplication operation independent of the stage number i – The multiplication complexity of the pipelined implementation is – The finite-precision pipelined filters suffer from inexact pole-zero cancellation, which leads to magnitude and phase error. These errors can be reduced by increasing the wordlength (see p. 323) ( ) ( ){ }MMjMj aeaea pp 212 ,,, ×-××× i2 ( ) ( )( ),212exp injaz p+= (10.8)( )12,,1,0 -×××= in ( )2log 2 +M Chapter 10 11 Fi g. 4 • Po le re pr es en ta tio n of a 1 ST -o rd er re cu rs iv e fi lte r • Po le /z er o re pr es en ta tio n of a 1 ST -o rd er II R w ith 8 lo op s ta ge s • D ec om po si tio n ba se d on p ip el in ed II R fo r M = 8 Chapter 10 12 Pipelining in 1st-Order IIR Digital Filters (continued) 3. Look-Ahead Pipelining with General Decomposition • The idea of decomposition can be extended to any arbitrary number of loop pipelining stages M. If , then the non-recursive stages implement , , zeros, respectively, totaling (M-1) zeros • Example (Example 10.3.3, p.325) Consider the 1st-order IIR – A 12-stage pipelined decomposed implementation is given by pMMMM ×××= 21( )11 -M ( )121 -MM ( )1, 121 -×××××× - pp MMMM 11 1 )( -×- = za zH ( )( )( ) 1212 6644221 1212 11 0 1 111 1 )( - ---- - = - ×- ++++ = ×- × = å za zazazaaz za za zH i ii - This implementation is based on a decomposition (see Fig. 5) 232 ´´ Chapter 10 13 – The first section implements 1 zero at –a, the second section implements 4 zeros at , and the third section implements 6 zeros at – Another decomposed implementation ( decomposition) is given by • The first section implements 1 zero at –a, the second section implements 2 zeros at , and the third section implements 8 zeros at { }323, pp jj aeae ±± { }656 ,, pp jj aeaeja ±±± ( )( )( ) 1212 8844221 1 111 )( - ---- ×- ++++ = za zazazaaz zH 322 ´´ { }653236 ,,, pppp jjjj aeaeaeae ±±±± ja± – The third decomposition ( decomposition) is given by • The first section implements 2 zeros at , the second section implements 3 zeros at , and the third section implements 6 zero at 223 ´´ ( )( )( ) 1212 6633221 1 111 )( - ---- ×- ++++ = za zazazaaz zH { }32pjae± { }3, pjaea ±- { }656 ,, pp jj aejaae ±± ± Chapter 10 14 Fig.5(a) Fig.5(b) Pole-zero location of a 12-stage pipelined 1ST-order IIR Decomposition of the zeros of the pipelined filter for a 2X3X2 decomposition Chapter 10 15 Pipelining in Higher-order IIR Digital Filters • Higher-order IIR digital filters can be pipelined by using clustered look-ahead or scattered look-ahead techniques. (For 1st-order IIR filters, these two look-ahead techniques reduce to the same form) – Clustered look-ahead: Pipelined realizations require a linear complexity in the number of loop pipelined stages and are not always guaranteed to be stable – Scattered look-ahead: Can be used to derive stable pipelined IIR filters – Decomposition technique: Can also be used to obtain area-efficient implementation for higher-order IIR for scattered look-ahead filters – Constrained filter design techniques: Achieve pipelining without pole-zero cancellation Chapter 10 16 • The transfer function of an N-th order direct-form recursive filter is described by • Equivalently, the output y(n) can be described in terms of the input sample u(n) and the past input/output samples as follows – The sample rate of this IIR filter realization is limited by the throughput of 1 multiplication and 1 addition, since the critical path contains a single delay element å å = - = - - = N i i i N i i i za zb zH 1 0 1 )( (10.9) (10.10) )()( )()()( 1 01 nzinya inubinyany N i i N i i N i i +-= -+-= å åå = == Chapter 10 17 1. Clustered Look-Ahead Pipelining • The basic idea of clustered look-ahead: – Add canceling poles and zeros to the filter transfer function such that the coefficients of in the denominator of the transfer function are 0, and the output samples y(n) can be described in terms of the cluster of N past outputs: – Hence the critical loop of this implementation contains M delay elements and a single multiplication. Therefore, this loop can be pipelined by M stages, and the sample rate can be increased by a factor M. This is referred to as M-stage clustered look-ahead pipelining { })1(1 ,, --- ××× Mzz { })1(,),( +--×××- NMnyMny Pipelining in Higher-order IIR Digital Filters (cont’d) Chapter 10 18 • Example: (Example 10.4.1, p.327) Consider the all-pole 2nd-order IIR filter with poles at . The transfer function of this filter is – A 2-stage pipelined equivalent IIR filter can be obtained by eliminating the term in the denominator (i.e., multiplying both the numerator and denominator by ). The transformed transfer function is given by: – From, the transfer function, we can see that the coefficient of in the denominator is zero. Hence, the critical path of this filter contains 2 delay elements and can be pipelined by 2 stages { }43,21 2 8 31 4 51 1 )( -- +- = zz zH (10.11) 1-z ( )1451 -+ z 3 32 152 16 19 1 4 5 1 4 5 1 4 5 2 8 31 4 5 1 1 1 1 1 1 )( -- - - - -- +- + = + + × +- = zz z z z zz zH (10.12) 1-z Chapter 10 19 • Computation complexity: The numerator (non-recursive portion) of this pipelined filter needs (N+M) multiplications, and the denominator (recursive portion) needs N multiplications. Thus, the total complexity of this pipelined implementation is (N+N+M) multiplications • Stability: The canceling poles and zeros are utilized for pipelining IIR filters. However, when the additional poles lie outside the unit circle, the filter becomes unstable. Note that the filters in (10.12) and (10.13) are unstable. – Similarly, a 3-stage pipelined realization can be derived by eliminating the terms of in the denominator of (10.21), which can be done by multiplying both numerator and denominator by . – The new transfer function is given by: { }21 , -- zz ( )21 1619451 -- ++ zz 4 128 573 64 65 2 16 191 4 5 1 1 )( -- -- +- ++ = zz zz zH (10.13) Chapter 10 20 • If the desired pipeline delay M does not produce a stable filter, M should be increased until a stable pipelined filter is obtained. To obtain the optimal pipelining level M, numerical search methods are generally used • Example (Example 10.4.3, p.330) Consider a 5-level (M=5) pipelined implementation of the following 2nd-order transfer function – By the stability analysis, it is shown that (M=5) does not meet the stability condition. Thus M is increased to M=6 to obtain the following stable pipelined filter as 21 6889.05336.11 1 )( -- +- = zz zH (10.14) 76 54321 5011.03265.11 7275.01454.14939.16630.15336.11 )( -- ----- +- +++++ = zz zzzzz zH (10.15) 2. Stable Clustered Look-Ahead Filter Design Chapter 10 21 3. Scattered Look-Ahead Pipelining • Scattered look-ahead pipelining: The denominator of the transfer function in (10.9) is transformed in a way that it contains the N terms . Equivalently, the state y(n) is computed in terms of N past scattered states y(n-M),y(n-2M),, and y(n-NM) • In scattered look-ahead, for each poles in the original filter, we introduce (M-1) canceling poles and zeros with equal angular spacing at a distance from the origin the same as that of the original pole. – Example: if the original filter has a pole at , we add (M-1) poles and zeros at to derive a pipelined realization with M loop pipeline stages • Assume that the denominator of the transfer function can be factorized as follows: pz= { }NMMM zzz --- ××× ,,, 2 ( ){ }1,,2,1,2exp -×××== MkMkjpz p Õ = --= N i i zpzD 1 1)1()( (10.16) Chapter 10 22 – Then, the pipelining process using the scattered look-ahead approach can be described by (continued) ( ) ( ) )(' )(' 1 1)( )( )( )( 1 1 0 12 1 1 1 12 MN i M k Mkj i N i M k Mkj i zD zN zep zepzN zD zN zH = - - = = Õ Õ Õ Õ = - = - = - = - p p (10.17) • Example (Example 10.4.5, p.332) Consider the 2nd-order filter with complex conjugate poles at . The filter transfer function is given by – We can pipeline this filter by 3 stages by introducing 4 additional poles and zeros at if qjrez ±= 221cos21 1 )( -- +- = zrzr zH q ( ) ( ){ }3232 , pqpq -±+± == jj rezrez 32pq ¹ Chapter 10 23 – (cont’d) The equivalent pipelined filter is then given by – When , then only 1 additional pole and zero at is required for 3-stage pipelining since and . The equivalent pipelined filter is then given by • Example (Example 10.4.6, p.332) Consider the 2nd-order filter with real poles at . The transfer function is given by ( ) ( ) 6633 4433221 3cos21 cos22cos21cos21 )( -- ---- +×- +×+×++×+ = zrzr zrzrzrzr zH q qqq 32pq = rz = ( ) rrez j == -± 32pq ( ) qpq jj rerez ±+± == 32 ( )( ) 33 1 1221 1 1 1 11 1 )( - - --- - ×- ×- = ×-×+×+ ×- = zr zr zrzrzr zr zH { }21, rzrz == ( ) 2211211 1 )( -- ×+×+- = zrrzrr zH Chapter 10 24 – (cont’d) A 3-stage pipelined realization is derived by adding poles(and zeros) at . The pipelined realization is given by – The pole-zero locations of a 3-stage pipelined 2nd-order filter with poles at z=1/2 and z=3/4 are shown in Fig. 6 • CONCLUSIONS: – If the original filter is stable, then the scattered look-ahead approach leads to stable pipelined filters, because the distance of the additional poles from the original is the same as that of the original filter – Multiplication Complexity: (NM+1) for the non-recursive portion in (10.17), and N for the recursive portion. Total pipelined filter multiplication complexity is (NM+N+1) { }322321 , pp jj erzerz ±± == ( ) ( ) ( ) ( ) 6323133231 42 2 2 1 3 2121 22 221 2 1 1 21 1 1 )( -- ---- +×+- ++++++++ = zrrzrr zrrzrrrrzrrrrzrr zH Chapter 10 25 2 1 4 3 1 Re[z] Im[z] Fig. 6: Pole-zero representation of a 3-stage pipelined equivalent stable filter derived using scattered look-ahead approach – (cont’d) The multiplication complexity is linear with respect to M. and is much greater than that of clustered look-ahead – Also the latch complexity is square in M, because each multiplier is pipelined by M stages Chapter 10 26 4. Scattered Look-Ahead Pipelining with Power-of-2 Decomposition • This kind of decomposition technique will lead to a logarithmic increase in multiplication complexity (hardware) with respect to the level of pipelining • Let the transfer function of a recursive digital filter be – A 2-stage pipelined implementation can be obtained by multiplying by in the numerator and denominator. The equivalent 2-stage pipelined implementation is described by )( )( 1 )( 1 0 zD zN za zb zH N i i i N i i i = ×- = å å = - = - (10.18) ( )å = -×-- Ni iii za1 )1(1 ( ) ( )( ) ( )( ) )(' )(' )1(11 11 )( 11 10 zD zN zaza zazb zH N i i i iN i i i N i i i iN i i i = ×--×- ×-- = åå åå = - = - = - = - (10.19) Chapter 10 27 – (cont’d) Similarly, subsequent transformations can be applied to obtain 4, 8, and 16 stage pipelined implementations, respectively – Thus, to obtain an M-stage pipelined implementation (for power- of-2 M), sets of such transformations need to be applied. – By applying sets of such transformations, an equivalent transfer function (with M pipelining stages inside the recursive loop) can be derived, which requires a complexity of multiplications, a logarithmic complexity with respect to M. M2log ( )1log 2 -M ( )1log2 2 ++ MNN – Note: the number of delays (or latches) is linear: the total number of delays (or latches) is approximately , about NM delays in non-recursive portion, and delays for pipelining each of the multipliers by M stages – In the decomposed realization, the 1st stage implements an N-th order non-recursive section, and the subsequent stages respectively implement 2N, 4N,, NM/2-order non-recursive sections ( )1log 2 +MNM MNM 2log MN 2log Chapter 10 28 • Example (Example 10.4.7, p.334) Consider a 2nd-order recursive filter described by – The poles of the system are located at (see Fig. 7(a)). – The pipelined filter requires multiplications and is described by 221 2 2 1 10 cos21)( )( )( -- -- +×- ++ == zrzr zbzbb zU zY zH q { }qjrez ±= ´ +×- = -- = -å MMMM i i i zrzMr zb zH 22 2 0 cos21 )( q ( )Õ -= -- +++×+1log0 22222 112cos21Mi i iiii zrzr q { }Mwhere pq 2¹ – The 2M poles of the transformed transfer function (shown in Fig. 7(b)) are located at ( )( ) ( )1,2,1,0,2 -×××== +± Mirez Mij pq ( )5log2 2 +M Chapter 10 29 Fig. 7: (also see Fig.10.11, p.336) Pole-zero representation for Example 10.4.7, the decompositions of the zeros is shown in (c) Chapter 10 30 Parallel Processing in IIR Filters • Parallel processing can also be used in design of IIR filters • First discuss parallel processing for a simple 1st-order IIR filter, then we discuss higher order filters • Example: (Example 10.5.1, p.339) Consider the transfer function of a 1st- order IIR filter given by – where for stability. This filter has only 1 pole located at . The corresponding input-output can be written as y(n+1)=ay(n)+u(n) – Consider the design of a 4-parallel architecture (L=4) for the foregoing filter. Note that in the parallel system, each delay element is referred to as a block delay, where the clock period of the block system is 4 times the sample period. Therefore, the loop update equation should update y(n+4) by using inputs and y(n). 1 1 1 )( - - - = az z zH 1£a az = Chapter 10 31 – By iterating the recursion (or by applying look-ahead technique), we get – Substituting – The corresponding architecture is shown in Fig.8. )3()2()1()()()4( 234 +++++++=+ nunaunuanuanyany kn 4= )34()24( )14()4()4()44( 234 ++++ +++=+ kukau kuakuakyaky D 2a 3aa )24( +ku )14( +ku )4( ku)34( +ku 4a )44( +ky )4( ky Fig. 8: (also see Fig.10.14, p. 340) (10.20) Chapter 10 32 – The pole of the original filter is at , whereas the pole for the parallel system is at , which is much closer to the origin since – An important implication of this pole movement is the improved robustness of the system to the round-off noise – A straightforward block processing structure for L=4 obtained by substituting n=4k+4, 4k+5, 4k+6 and 4k+7 in (10.20) is shown in Fig. 9. – Hardware complexity of this architecture: multiply-add operations (Because L multiply-add operations are required for each output and there are L outputs in total) – The square increase in hardware complexity can be reduced by exploiting the concurrency in the computation (the decomposition property in the scattered look-ahead mode can not be exploited in the block processing mode because one hardware delay element represents L sample delays) az = 4az = { }1since,4 ££ aaa 2L Chapter 10 33 Fig. 9: (also Fig.10.15, p.341) A 4-parallel 1st-order recursive filter D 2a3a )44( +ku )54( +ku )64( +ku)34( +ku 4a )74( +ky )34( +ky a D 2a 3a 4a )64( +ky )24( +ky a D D 2a3a 4a )54( +ky )14( +ky a D 3a 2a 4a )64( +ky )4( ky a D D Chapter 10 34 • Trick: – Instead of computing y(4k+1), y(4k+2) & y(4k+3) independently, we can use y(4k) to compute y(4k+1), use y(4k+1) to compute y(4k+2), and use y(4k+2) to compute y(4k+3), at the expense of an increase in the system latency, which leads to a significant reduction in hardware complexity. – This method is referred as incremental block processing, and y(4k+1), y(4k+2) and y(4k+3) are computed incrementally. • Example (Example 10.5.2, p.341) Consider the same 1st-order filter in last example. To derive its 4-parallel filter structure with the minimum hardware complexity instead of simply repeating the hardware 4 times as in Fig.15, the incremental computation technique can be used to reduce hardware complexity – First, design the circuit for computing y(4k) (same as Fig.14) – Then, derive y(4k+1) from y(4k), y(4k+2) from y(4k+1), y(4k+3) from y(4k+2) by using ï î ï í ì +++=+ +++=+ +=+ )24()24()34( )14()14()24( )4()4()14( kukayky kukayky kukayky Chapter 10 35 – The complete architecture is shown in Fig.10 – The hardware complexity has reduced from to (2L-1) at the expense of an increase in the computation time for y(4k+1), y(4k+2) and y(4k+3) Fig.10: (also see Fig.10.16, p.342) Incremental block filter structure with L=4 2L D 2a 3aa )24( +ku )14( +ku )4( ku)34( +ku 4a )44( +ky )4( ky a a a )14( +ky )24( +ky )34( +ky Chapter 10 36 • Example (Example 10.5.3, p.342) Consider a 2nd-order IIR filter described by the transfer function (10.21). Its pole-zero locations are shown in Fig.11. Derive a 3-parallel IIR filter where in every clock cycle 3 inputs are processed and 3 outputs are generated – Since the filter order is 2, 2 outputs need to be updated independently and the 3rd output can be computed incrementally outside the feedback loop using the 2 updated outputs. Assume that y(3k) and y(3k+1) are computed using loop update operations and y(3k+2) is computed incrementally. From the transfer function, we have: – The loop update process for the 3-parallel system is shown in Fig.12 where y(3k+3) and y(3k+4) are computed using y(3k) and y(3k+1) 2 8 31 4 5 21 1 )1( )( -- - +- + = zz z zH (10.21) )2()1(2)()( );()2()1()( 8 3 4 5 -+-+= +---= nunununf nfnynyny (10.22) Chapter 10 37 2 1 4 3 11- D D y(3k+3) y(3k) y(3k+1)y(3k+4) Fig.11: Pole-zero plots for the transfer function Fig.12: Loop update for block size=3 Chapter 10 38 – The computation of y(3k+3) using y(3k) & y(3k+1) can be carried out if y(n+3) can be computed using y(n) & y(n+1). Similarly y(3k+4) can be computed using y(3k) & y(3k+1) if y(n+4) can be expressed in terms of y(n) & y(n+1) (see Fig.13). These state update operations correspond to clustered look-ahead operation for M=2 and 3 cases. The 2-stage and 3- stage clustered look-ahead equations are derived as: [ ] [ ] )()1( )3()2()4()3( )( )2()1()3()2( )()2()1()( 4 5 32 15 8 3 4 5 16 19 8 3 8 3 4 5 4 5 8 3 4 5 nfnf nynfnyny nf nynfnyny nfnynyny +-+ ---+---= + ---+---= +---= (10.23) Chapter 10 39 – Substituting n=3k+3 & n=3k+4 into (10.23), we have the following 2 loop update equations: – The output y(3k+2) can be obtained incrementally as follows: – The block structure is shown in Fig. 14 )43()33( )23()3()13()43( )23( )23()3()13()33( 4 5 16 19 128 57 64 65 4 5 32 15 16 19 ++++ ï î ï í ì ++-+=+ ++ ++-+=+ kfkf kfkykyky kf kfkykyky (10.24) )23()3()13()23( 3 8 4 5 ++-+=+ kfkykyky 3 2 n+2 n+3 n+4n+1n Fig.13: Relationship of the recursive outputs Chapter 10 40 Fig. 14: Block structure of the 2nd-order IIR filter (L=3) (also see Fig.10.20, p.344) D 2 )43( +ku D 2 2 D D 4 5 16 19 4 5 16 19 32 15- 64 65 128 51- 8 3- 4 5 )33( +ku )23( +ku )43(1 +kf )33(1 +kf )23(1 +kf )43(2 +kf )33(2 +kf )13( +ky )3( ky )23( +ky Chapter 10 41 • Comments – The original sequential system has 2 poles at {1/2, 3/4}. Now consider the pole locations of the new parallel system. Rewrite the 2 state update equations in matrix form: Y(3k+3)=AY(3k)+F, i.e. – The eigenvalues of system matrix A are , which are the poles of the new parallel system. Thus, the parallel system is more stable. Note: the parallel system has the same number of poles as the original system – For a 2nd-order IIR filter (N=2), there are total 3L+[(L-2)+(L-1)]+4+2(L- 2)=7L-3 multiplications, (the numerator part — 3L; the overhead of loop update — [(L-2)+(L-1)]; the loop multiplications — 4; the incremental computation — 2(L-2)). The multiplication complexity is linear function of block size L. This multiplication complexity can be further reduced by using fast parallel filter structures and substructure sharing for the incrementally-computed outputs ú û ù ê ë é +ú û ù ê ë é + ×ú û ù ê ë é =ú û ù ê ë é + + - - 2 1 64 65 128 57 16 19 32 15 )13( )3( )43( )33( f f ky ky ky ky (10.25) ( ) ( )343321 , Chapter 10 42 Combined Pipelining and Parallel Processing For IIR Filters • Pipelining and parallel processing can also be combined for IIR filters to achieve a speedup in sample rate by a factor L×M, where L denotes the levels of block processing and M denotes stages of pipelining, or to achieve power reduction at the same speed • Example (Example 10.6.1, p.345) Consider the 1st-order IIR with transfer function (10.26). Derive the filter structure with 4-level pipelining and 3-level block processing (i.e., M=4, L=3) – Because the filter order is 1, only 1 loop update operation is required.The other 3 outputs can be computed incrementally. 11 1 )( -×- = za zH (10.26) Chapter 10 43 – Since pipelining level M=4, the loop must contain 4 delay elements (shown in Fig.15). Since the block size L=3, each delay element represents a block delay (corresponds to 3 sample delays). Therefore, y(3k+12) needs to be expressed in terms of y(3k) and inputs (see Fig. 15). Fig.15: Loop update for the pipelined block system (also see Fig.10.21, p. 346) 4D y(3k)y(3k+12) 12a Chapter 10 44 – Substituting n=3k+12, we get: – Finally, we have: – The parallel-pipelined filter structure is shown in Fig. 16 )()11()12( )()1()( 1112 nunuanya nunayny +××××××+-+-= ××××××= +-= )123()93()63()3( )123()13()3()123( 11 3 2 612 1112 ++++++= ++××××××+++=+ kfkfakfakya kukuakyaky ïî ï í ì +++=+ +++++=+ )123()93()123( )123()113()103()123( 11 3 2 2 1 kfkfakf kukaukuakf where ï î ï í ì +++=+ ++=+ ++++++=+ )23()13()23( )13()3()13( )123()93()63()3()123( 11 3 2 612 kukayky kukayky kfkfakfakyaky (10.27) Chapter 10 45 Fig. 16: The filter structure of the pipelined block system with L=3 & M=4 (also see Fig.10.22, p.347) 3D )103( +ku 2a )113( +ku )123( +ku a 3D D 3a 2D 3a 4D 12a )123(1 +kf )123(2 +kf )13( +ku )23( +ku )23( +ky )13( +ky )3( ky Chapter 10 46 • Comments – The parallel-pipelined filter has 4 poles: . Since the pipelining level is 4 and the filter order is 1, there are total 4 poles in the new system, which are separated by the same angular distance. Since the block size is 3, the distance of the poles from the origin is . – Note: The decomposition method is used here in the pipelining phase. – The multiplication complexity ( assuming the pipelining level M to be power of 2) can be calculated as (10.28), which is linear with respect to L, and logarithmic with respect to M: • Example (Example 10.6.2, p. 347) Consider the 2nd-order filter in Example 10.5.3 again, design a pipelined-block system for L=3 and M=2 ( )3333 ,,, jajaaa -- 3a ( ) ( ) MLLML 22 log1211log1 +-=-+++- )2()1(2)()( );()2()1()( 8 3 4 5 -+-+= +---= nunununf nfnynyny (10.28) (10.29) Chapter 10 47 – A method similar to clustered look-ahead can be used to update y(3k+6) and y(3k+7) using y(3k) and y(3k+1). Then by index substitution, the final system of equations can be derived. – Suppose the system update matrix is A. Since the poles of the original system are , the eigenvalues of A can be verified to be – The poles of the new parallel-pipelined second-order filter are the square roots of eigenvalues of A, i.e., • Comments: In general, the systematic approach below can be used to compute the pole location of the new parallel pipelined system: – 1. Write the loop update equations using LM-level look-ahead, where M and L denote the level of pipelining and parallel processing, respectively. – 2. Write the state space representation of the parallel pipelined filter, where state matrix A has dimension N×N and N is the filter order – 3. Compute the eigenvalues of matrix A, – 4. The NM poles of the new parallel-pipelined system correspond to the M-th roots of the eigenvalues of A, i.e., ( )4321 , ( ) ( )643621 , ( ) ( ) ( ) ( )343343321321 ,,, -- il Ni ££1( )Mi 1 l

Các file đính kèm theo tài liệu này:

chap_chap10_824_7935.pdf